Automatic Instructions Downloader #45
Labels
No Label
Compat/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
Unraid
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: FrederikBaerentsen/BrickTracker#45
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Started work on instructions downloader in separate repo https://gitea.baerentsen.space/FrederikBaerentsen/InstructionsDownloader
Currently the script will prefer V29 over V39 (A4 over US Letter).
Naming of files is a mess.
V29
orV 29
is in the name/
is in the name (assuming multiple PDFs for a big set like1/2
and2/2
)x/y
) if more than 1 PDFV29
is there, download all the PDFs and name them{set_id}+{index + 1}.pdf
. If only 1 PDF can be found, just name it{set_id}.pdf
V29
is not there, name all the PDFs as-is, but replace/
with-
, space with-
, remove,
and.
.Examples:
LEGO Building Instructions for 21303-1 WALL•E [Original Version] - BI 3018, 124+4/115+150g SILK, 21303 V29
LEGO Building Instructions for 4866-1 The Knight Bus - BI 3005/60+4-4866 V 29/39
LEGO Building Instructions for 10313-1 Wildflower Bouquet - BI 3103, 56, 10313 V29
=>10313-1.pdf
LEGO Building Instructions for 76384-1 Hogwarts Moment: Herbology Class - BI 3105, 80+4, 76384 V29
=>76384-1.pdf
LEGO Building Instructions for 10280-1 Flower Bouquet - BI 3106, 80+4, 10280 V29
=>10280-1.pdf
LEGO Building Instructions for 43251-1 Antonio's Animal Sanctuary - BI 3101, 60+4, 43251 1/2 V29
=>43251-1+1.pdf
LEGO Building Instructions for 43251-1 Antonio's Animal Sanctuary - BI 3105, 100+4 43251 2/2 V29
=>43251-1+2.pdf
LEGO Building Instructions for 76925-1 Aston Martin Vantage Safety Car & AMR23 - BI 3101, 80+4, 76925 2/2 V29
=>76925-1+1.pdf
LEGO Building Instructions for 76925-1 Aston Martin Vantage Safety Car & AMR23 - BI 3101, 84+4 76925 1/2 V29
=>76925-1+1.pdf
To do
https://lego.brickinstructions.com/lego_instructions/set/{set_id}
(without-1
)<div id="instructionsContainer">
First version implemented in
053bf75e05
./add
/sets/<>/details
page/instructions
pageSince you have not made it an official release, you should probably have worked and kept that code in a different branch.
If other people are working on other stuff, we base our work off of
master
, and now it has changed but we don't know if the changes are the final version 🙃Okay, i see. I'm not used to working with git and other people. I'll roll back and create a new branch for that code.
No worries at all 🙂
I created an instructions branch and moved those two commits over there. Hopefully that should fix the issues. I'll work on that branch until i have a finished version.
The downloader works and it is possible to select which instructions you want downloaded. They are downloaded to the correct folder and then the rename/delete works as with manually uploaded instructions.
I've been trying to understand the socket implementation for a progress bar, but I'm pretty lost in your implementation unfortunately.
The socket is not really generic, it's really tailor-made to import sets from Rebrickable. I can't have a look at it if you want.
Would be great if you could take a look.
What's implemented is:
/instructions
pageNeed to have:
Nice to have:
If you could help with the socket/progressbar/status it would be great. I'm gonna take a stab at the alternative instructions site later this weekend.
I checked your code and unfortunately I'm a bit bummed out that it doesn't draw any inspiration of the code cleanliness that I've added to your project over the last 3 weeks since I started rewriting.
You have unused imports, put back hardcoded values in the code when there is an file where you can easily add any environment variable. There is also half of the logic (the bs4 part) that is hanging in the views rather than being implemented in the models.
I know that you are not a developer by trade (I am not either) and that it is your personal project but I don't see myself spending any more time cleaning up your code. We have a very different approach to code and our approach may not be compatible in the long run. I know I have strict standards that may not be suitable for everyone.
No hard feelings, I'll fork my work over to Github to continue adding the feature I wanted in my own way. The code will remain open source obviously.
Sorry you feel this way. I'm really trying to learn and I wouldn't call my import code production ready. I had plans to go through it this weekend. Especially the things you mention (bs4) and import felt wrong where it was left. I have about 2-3 hours a week (between job and kids) to code.
I should have spend more time learning your setup before trying to implement things.
No worries, I know I'm not the easiest to work with. I thought your code was ready to ship 🙂.
Let's release
1.1.0
without the downloader and I'll give you pointers on how to improve your code and integrate it, so that everyone wins 👍.Thank you 😃 I moved some code around, but there's still a few things i need sorted. I'll work on it tonight and see how it goes.
@gregoo Thanks for pushing me to read the code again. I pushed changes to the instructions branch.
Notes
/template/set/card.html
to the download page./template/instructions.html
with download button./template/instructions/download.html
with UI and reusing the same style as/template/instructions/download.html
and/template/add.html
Value is populated if user is coming from the set page.REBRICKABLE_LINK_INSTRUCTIONS_PATTERN
in.env
/.env.sample
andconfig.py
so the link can be changed easily.REBRICKABLE_USER_AGENT
in.env
/.env.sample
andconfig.py
so the user-agent can be changed easily.In
views/instructions.py
download
do_download
confirm_download
I'm not sure how to best parse the set_id. Currently it's done using BrickInstructions(set_id) but this feels wrong as it should be a filename that's parsed (as far as i understand from the code).
I don't know if BrickInstructions should even be used for this, or if the download functions should be moved to its own BrickInstructionsDownload class. In
BrickInstructions("").get_list(request.form)
I even pass an empty string, as I don't really have anything else to parse.In
instructions.py
url_for_instructions
find_instructions
LEGO Building Instructions
. This isn't optimal, but I'm not sure how else to implement it. There's no API access and we have scrape the data. The search string could be an env variable.get_list
download
I'm not sure if more logic should be moved from the view to the instructions file like
Wow. I'm genuinely impressed by the changes. Good job 🤩!
set_num
is gone for setsAs part of the first database schema upgrade, I upgraded the column name for a set with
set
being the combination ofnumber
-version
:107-1
number
being the set number:107
version
being the set version:1
This separation is useful to properly sort by set number, otherwise the database doesn't compare number with numbers, but strings with strings, and it is a bit dumb 😂.
To make your changes consistent with the rest of the code, I would rename all the
set_num
toset
.Whitespaces in blank lines
I don't know which editor you are using but it doesn't seem to trim the blank line when you save. This leaves a lot of whitespaces in blank lines, which should be avoided.
For instance:
response.ok
In other parts of the code where I'm using
requests
, I'm checkingresponse.ok
rather than checking specifically for a code200
. I think you should use that to account for more return codes that are valid and not200
.Downloading files
From what I've read on the Internet, the most efficient way to grab a file is to
request
it as astream
and thencopyfileobj
it. Not sure how true this is but that's how I download the images, and even the CSV files.request.args.get
in the templatesI would not use
request.args.get
directly in the template as it is an user supplied value. I think it would be better to grab it in the view, maybe parse it withRebrickableSet.parse_number
(which I realize is a wrong name now 😅) and if it is not raising exception inject it to the template via the context. For instanceDownload instruction will be hidden if there are existing instructions
On a set card, where yo placed the Download link is inside the block if there are no instructions. Meaning that the link will disappear as soon as there is an instruction file.
I don't know if it is by design but I would move it outside of that block so that it is permanently available (it will need a new
is_authenticated
block as well).There is no feedback on the download page if there are no instruction found
For instance if I type
1642
I'm back to the same page.Object-oriented code is not easy
I'm seeing some confusion in the use of
self
, and I think you want too hard to make the functions part of the class 😂.Your
find_instructions
andget_list
functions are not using any property of the object, and theurl_for_instructions
one is trying too hard to copy the one from the sets, but it is used only once. If you really want this code to be part of theBrickInstructions
class, you can make them static methods, which do not require to instantiate an object, and does not useself
.download
is correct as a instance function since it usesself.path
I would also consolidate them since they are not called more than once.
Finding instructions to download
I wouldn't worry too much about it being optimal or not. Does it work for you? If yes then it is a good first version 🙂. Wait for any bug report to improve upon your code, don't be afraid to release things that do work 👍.
I made two commits after yours to illustrate precisely all that #50 😂
It adds the feedback I was missing as well
File naming and detection
For the file naming, maybe the easiest thing to do would be to simply download the files with the
alt
names that you are extracting (without stripping the 'Lego build instructions' and make theBrickInstructions
object a bit smarter in the way it detects sets in filename.It would be easier to add all the variants here:
And maybe allow people to define specific prefixes they may have in a configuration variable?
Thank you 😃 I learned a lot of good things.
This makes sense. Thanks for the heads up.
I'm using VSCode for the first time. I found the setting to trim whitespace. Hopefully that should fix it. Normally i use Vim, but with this many files i wanted some GUI.
Gotcha. Makes sense.
Alright. Never knew there was multiple ways 😅 Learning something new every day.
I'm not completely sure I understand what you mean, but i see the changes in the code and that makes sense for me at least!
This was by design, as the upload button also disappears but I wasn't completely sure I liked it that way, so I'm happy with the change you made.
Yeah, this was one of the things i wasn't sure how to properly implement. I like the feedback on the
/add
page, but didn't have the time to fully understand the implementation yesterday.Yeah, it's a decade since i had object-oriented programming in college, so I'm rusty. Mainly been doing python/bash scripts since.
Oh i remember the term
static
but that's something I gotta do some reading on together with OOP.Nice, your commit makes a lot of sense.
😅 Thanks.
Your PR has some merge conflicts, so I'm not sure how to properly merge it. Gotta read up on that. Would it help to add you as collaborator so you aren't depending on me to approve the merge requests?
I'm gonna try and read up on the socket implementation, as i really want to understand how the progressbar is made.
I had my work based on
master
and not theinstructions
branch that's why the history were mismatching. It should be better now.I will provide the implementation for the socket so that you have another example. Maybe that will be easier to understand how it works?
As I said the current socket code is only made to load sets from Rebrickable and it will require quite some code to make it work with downloads, or to make it generic.
Ah i see. Merged it now.
That would be a great help!
Something like that I presume (with the progress counter not bugged):
The PR has been merged and I'll update the docker image tomorrow to v1.2.0.
Peeron instructions (PICSL) as an Alternative Source
This feature is currently a work in progress.
If Rebrickable fails to locate the instructions for a set, the system will suggest using Peeron as an alternative source.
Example Flow
Search Rebrickable
Searching an old set on Rebrickable will fail.
Peeron Search Results
When searching Peeron, a selection of available pages will be displayed:
Selecting and Downloading Instructions
Users can select the desired pages, after which a PDF will be created and downloaded:
Error Handling
PDF creation
This unfortunately needs three extra libraries:
To do
The basic functionality is working, but the following tasks remain:
Proof of concept code (in a self-contained python script) can be seen here https://gitea.baerentsen.space/FrederikBaerentsen/InstructionsDownloader/src/branch/main/dl_peeron.py. This is before i changed to using BytesIO stream without saving the image locally.
(You should probably make a brand new issue for this upgrade of the feature 😂)
Probably a good idea. I'll open a new one 👍