Automatic Instructions Downloader #45

Closed
opened 2025-01-20 23:05:55 +01:00 by FrederikBaerentsen · 24 comments

Started work on instructions downloader in separate repo https://gitea.baerentsen.space/FrederikBaerentsen/InstructionsDownloader

Currently the script will prefer V29 over V39 (A4 over US Letter).

Naming of files is a mess.

  1. Check if V29 or V 29 is in the name
  2. Check if / is in the name (assuming multiple PDFs for a big set like 1/2 and 2/2)
  3. Ignore numbers over 10. No set should have 10 PDFs.
  4. Add them all to a list
  5. Sort the list on x (x/y) if more than 1 PDF
  6. if V29 is there, download all the PDFs and name them {set_id}+{index + 1}.pdf. If only 1 PDF can be found, just name it {set_id}.pdf
  7. If V29 is not there, name all the PDFs as-is, but replace / with -, space with -, remove , and ..

Examples:

  • Errors
    • LEGO Building Instructions for 21303-1 WALL•E [Original Version] - BI 3018, 124+4/115+150g SILK, 21303 V29
    • LEGO Building Instructions for 4866-1 The Knight Bus - BI 3005/60+4-4866 V 29/39
  • Works:
    • LEGO Building Instructions for 10313-1 Wildflower Bouquet - BI 3103, 56, 10313 V29 => 10313-1.pdf
    • LEGO Building Instructions for 76384-1 Hogwarts Moment: Herbology Class - BI 3105, 80+4, 76384 V29 => 76384-1.pdf
    • LEGO Building Instructions for 10280-1 Flower Bouquet - BI 3106, 80+4, 10280 V29 => 10280-1.pdf
    • LEGO Building Instructions for 43251-1 Antonio's Animal Sanctuary - BI 3101, 60+4, 43251 1/2 V29 => 43251-1+1.pdf
    • LEGO Building Instructions for 43251-1 Antonio's Animal Sanctuary - BI 3105, 100+4 43251 2/2 V29 => 43251-1+2.pdf
    • LEGO Building Instructions for 76925-1 Aston Martin Vantage Safety Car & AMR23 - BI 3101, 80+4, 76925 2/2 V29 => 76925-1+1.pdf
    • LEGO Building Instructions for 76925-1 Aston Martin Vantage Safety Car & AMR23 - BI 3101, 84+4 76925 1/2 V29 => 76925-1+1.pdf

To do

  • Cleanup naming and test more cases.
  • If no links are found:
    1. Check https://lego.brickinstructions.com/lego_instructions/set/{set_id} (without -1)
    2. Search for <div id="instructionsContainer">
    3. Grab all
    <img src='https://lego.brickinstructions.com/instructions/01000/1818/001.jpg' class='instructionMainImageLandscape' title='LEGO 1818 Aircraft and Ground Support Equipment and Vehicle. Instructions Page 1' />
    
    1. Combine to PDF
Started work on instructions downloader in separate repo https://gitea.baerentsen.space/FrederikBaerentsen/InstructionsDownloader Currently the script will prefer V29 over V39 (A4 over US Letter). Naming of files is a mess. 1. Check if `V29` or `V 29` is in the name 2. Check if `/` is in the name (assuming multiple PDFs for a big set like `1/2` and `2/2`) 3. Ignore numbers over 10. No set should have 10 PDFs. 4. Add them all to a list 5. Sort the list on x (`x/y`) if more than 1 PDF 6. if `V29` is there, download all the PDFs and name them `{set_id}+{index + 1}.pdf`. If only 1 PDF can be found, just name it `{set_id}.pdf` 7. If `V29` is not there, name all the PDFs as-is, but replace `/` with `-`, space with `-`, remove `,` and `.`. ## Examples: - Errors - `LEGO Building Instructions for 21303-1 WALL•E [Original Version] - BI 3018, 124+4/115+150g SILK, 21303 V29` - `LEGO Building Instructions for 4866-1 The Knight Bus - BI 3005/60+4-4866 V 29/39` - Works: - `LEGO Building Instructions for 10313-1 Wildflower Bouquet - BI 3103, 56, 10313 V29` => `10313-1.pdf` - `LEGO Building Instructions for 76384-1 Hogwarts Moment: Herbology Class - BI 3105, 80+4, 76384 V29` => `76384-1.pdf` - `LEGO Building Instructions for 10280-1 Flower Bouquet - BI 3106, 80+4, 10280 V29` => `10280-1.pdf` - `LEGO Building Instructions for 43251-1 Antonio's Animal Sanctuary - BI 3101, 60+4, 43251 1/2 V29` => `43251-1+1.pdf` - `LEGO Building Instructions for 43251-1 Antonio's Animal Sanctuary - BI 3105, 100+4 43251 2/2 V29` => `43251-1+2.pdf` - `LEGO Building Instructions for 76925-1 Aston Martin Vantage Safety Car & AMR23 - BI 3101, 80+4, 76925 2/2 V29` => `76925-1+1.pdf` - `LEGO Building Instructions for 76925-1 Aston Martin Vantage Safety Car & AMR23 - BI 3101, 84+4 76925 1/2 V29` => `76925-1+1.pdf` ## To do - [ ] Cleanup naming and test more cases. - [ ] If no links are found: 1. Check `https://lego.brickinstructions.com/lego_instructions/set/{set_id}` (without `-1`) 2. Search for `<div id="instructionsContainer">` 3. Grab all ```` <img src='https://lego.brickinstructions.com/instructions/01000/1818/001.jpg' class='instructionMainImageLandscape' title='LEGO 1818 Aircraft and Ground Support Equipment and Vehicle. Instructions Page 1' /> ```` 4. Combine to PDF
FrederikBaerentsen added the
Kind/Feature
label 2025-01-20 23:05:55 +01:00
FrederikBaerentsen self-assigned this 2025-01-20 23:05:55 +01:00
FrederikBaerentsen added this to the General Improvements project 2025-01-20 23:05:55 +01:00
Author
Owner

First version implemented in 053bf75e05.

First version implemented in 053bf75e05.
Author
Owner
  • Add progress bar in style of /add
  • Error on not instructions found
  • Error on failed download
  • Success on correct download
  • Not sure if I should:
    • stay on page after download with link to set page
    • return to if i came from /sets/<>/details page
    • return to instructions list if i came from the /instructions page
- [ ] Add progress bar in style of `/add` - [ ] Error on not instructions found - [ ] Error on failed download - [ ] Success on correct download - Not sure if I should: - stay on page after download with link to set page - return to if i came from `/sets/<>/details` page - return to instructions list if i came from the `/instructions` page
Collaborator

Since you have not made it an official release, you should probably have worked and kept that code in a different branch.
If other people are working on other stuff, we base our work off of master, and now it has changed but we don't know if the changes are the final version 🙃

Since you have not made it an official release, you should probably have worked and kept that code in a different branch. If other people are working on other stuff, we base our work off of `master`, and now it has changed but we don't know if the changes are the final version 🙃
Author
Owner

Okay, i see. I'm not used to working with git and other people. I'll roll back and create a new branch for that code.

Okay, i see. I'm not used to working with git and other people. I'll roll back and create a new branch for that code.
Collaborator

No worries at all 🙂

No worries at all 🙂
Author
Owner

I created an instructions branch and moved those two commits over there. Hopefully that should fix the issues. I'll work on that branch until i have a finished version.

I created an instructions branch and moved those two commits over there. Hopefully that should fix the issues. I'll work on that branch until i have a finished version.
Author
Owner

The downloader works and it is possible to select which instructions you want downloaded. They are downloaded to the correct folder and then the rename/delete works as with manually uploaded instructions.

I've been trying to understand the socket implementation for a progress bar, but I'm pretty lost in your implementation unfortunately.

The downloader works and it is possible to select which instructions you want downloaded. They are downloaded to the correct folder and then the rename/delete works as with manually uploaded instructions. I've been trying to understand the socket implementation for a progress bar, but I'm pretty lost in your implementation unfortunately.
Collaborator

The socket is not really generic, it's really tailor-made to import sets from Rebrickable. I can't have a look at it if you want.

The socket is not really generic, it's really tailor-made to import sets from Rebrickable. I can't have a look at it if you want.
Author
Owner

Would be great if you could take a look.

What's implemented is:

  • A button on the set page like the upload button.
  • A button on the /instructions page
  • Prefill set-number when coming from the set page
  • Select and download instructions
  • Remove "LEGO Building Instructions for " from the filename if present.

Need to have:

  • Progress bar with status (failure, no instructions available, success)

Nice to have:

If you could help with the socket/progressbar/status it would be great. I'm gonna take a stab at the alternative instructions site later this weekend.

Would be great if you could take a look. What's implemented is: - A button on the set page like the upload button. - A button on the `/instructions` page - Prefill set-number when coming from the set page - Select and download instructions - Remove "LEGO Building Instructions for " from the filename if present. Need to have: - Progress bar with status (failure, no instructions available, success) Nice to have: - Ability to search a secondary site like http://lego.brickinstructions.com and combine jpgs to a pdf is instructions are found. If you could help with the socket/progressbar/status it would be great. I'm gonna take a stab at the alternative instructions site later this weekend.
Collaborator

I checked your code and unfortunately I'm a bit bummed out that it doesn't draw any inspiration of the code cleanliness that I've added to your project over the last 3 weeks since I started rewriting.

You have unused imports, put back hardcoded values in the code when there is an file where you can easily add any environment variable. There is also half of the logic (the bs4 part) that is hanging in the views rather than being implemented in the models.

I know that you are not a developer by trade (I am not either) and that it is your personal project but I don't see myself spending any more time cleaning up your code. We have a very different approach to code and our approach may not be compatible in the long run. I know I have strict standards that may not be suitable for everyone.

No hard feelings, I'll fork my work over to Github to continue adding the feature I wanted in my own way. The code will remain open source obviously.

I checked your code and unfortunately I'm a bit bummed out that it doesn't draw any inspiration of the code cleanliness that I've added to your project over the last 3 weeks since I started rewriting. You have unused imports, put back hardcoded values in the code when there is an file where you can easily add any environment variable. There is also half of the logic (the bs4 part) that is hanging in the views rather than being implemented in the models. I know that you are not a developer by trade (I am not either) and that it is your personal project but I don't see myself spending any more time cleaning up your code. We have a very different approach to code and our approach may not be compatible in the long run. I know I have strict standards that may not be suitable for everyone. No hard feelings, I'll fork my work over to Github to continue adding the feature I wanted in my own way. The code will remain open source obviously.
Author
Owner

Sorry you feel this way. I'm really trying to learn and I wouldn't call my import code production ready. I had plans to go through it this weekend. Especially the things you mention (bs4) and import felt wrong where it was left. I have about 2-3 hours a week (between job and kids) to code.

I should have spend more time learning your setup before trying to implement things.

Sorry you feel this way. I'm really trying to learn and I wouldn't call my import code production ready. I had plans to go through it this weekend. Especially the things you mention (bs4) and import felt wrong where it was left. I have about 2-3 hours a week (between job and kids) to code. I should have spend more time learning your setup before trying to implement things.
Collaborator

No worries, I know I'm not the easiest to work with. I thought your code was ready to ship 🙂.
Let's release 1.1.0 without the downloader and I'll give you pointers on how to improve your code and integrate it, so that everyone wins 👍.

No worries, I know I'm not the easiest to work with. I thought your code was ready to ship 🙂. Let's release `1.1.0` without the downloader and I'll give you pointers on how to improve your code and integrate it, so that everyone wins 👍.
Author
Owner

Thank you 😃 I moved some code around, but there's still a few things i need sorted. I'll work on it tonight and see how it goes.

Thank you 😃 I moved some code around, but there's still a few things i need sorted. I'll work on it tonight and see how it goes.
Author
Owner

@gregoo Thanks for pushing me to read the code again. I pushed changes to the instructions branch.

Notes

  • Added link in /template/set/card.html to the download page.
  • Updated /template/instructions.html with download button.
  • Created /template/instructions/download.html with UI and reusing the same style as /template/instructions/download.html and /template/add.html Value is populated if user is coming from the set page.
  • Added REBRICKABLE_LINK_INSTRUCTIONS_PATTERN in .env/.env.sample and config.py so the link can be changed easily.
  • Added User-Agent to REBRICKABLE_USER_AGENT in .env/.env.sample and config.py so the user-agent can be changed easily.

In views/instructions.py

  • download
    • Show the downloads page.
  • do_download
    • Get set number from input field and show available instructions for download.
  • confirm_download
    • Start the download process

I'm not sure how to best parse the set_id. Currently it's done using BrickInstructions(set_id) but this feels wrong as it should be a filename that's parsed (as far as i understand from the code).
I don't know if BrickInstructions should even be used for this, or if the download functions should be moved to its own BrickInstructionsDownload class. In BrickInstructions("").get_list(request.form) I even pass an empty string, as I don't really have anything else to parse.

In instructions.py

  • url_for_instructions
    • Using env variable to create the rebrickable link
  • find_instructions
    • Get user-agent from env variable.
    • Parse the html to bs and search for LEGO Building Instructions . This isn't optimal, but I'm not sure how else to implement it. There's no API access and we have scrape the data. The search string could be an env variable.
  • get_list
    • Get the ticked boxes' name and url
  • download
    • Do the download. Using ByteIO together with existing FileStorage to save the files

I'm not sure if more logic should be moved from the view to the instructions file like

# Get list of selected instructions
selected_instructions = BrickInstructions("").get_list(request.form)

# No instructions selected
if not selected_instructions:
	return redirect(url_for('instructions.download'))

  
# Loop over selected instructions and download them
for href, filename in selected_instructions:
	BrickInstructions(f"{filename}.pdf").download(href)
@gregoo Thanks for pushing me to read the code again. I pushed changes to the instructions branch. # Notes - Added link in `/template/set/card.html` to the download page. - Updated `/template/instructions.html` with download button. - Created `/template/instructions/download.html` with UI and reusing the same style as `/template/instructions/download.html` and `/template/add.html` Value is populated if user is coming from the set page. - Added `REBRICKABLE_LINK_INSTRUCTIONS_PATTERN` in `.env`/`.env.sample` and `config.py` so the link can be changed easily. - Added User-Agent to `REBRICKABLE_USER_AGENT` in `.env`/`.env.sample` and `config.py` so the user-agent can be changed easily. - ## In `views/instructions.py` - `download` - Show the downloads page. - `do_download` - Get set number from input field and show available instructions for download. - `confirm_download` - Start the download process I'm not sure how to best parse the set_id. Currently it's done using BrickInstructions(set_id) but this feels wrong as it should be a filename that's parsed (as far as i understand from the code). I don't know if BrickInstructions should even be used for this, or if the download functions should be moved to its own BrickInstructionsDownload class. In `BrickInstructions("").get_list(request.form)` I even pass an empty string, as I don't really have anything else to parse. ## In `instructions.py` - `url_for_instructions` - Using env variable to create the rebrickable link - `find_instructions` - Get user-agent from env variable. - Parse the html to bs and search for `LEGO Building Instructions `. This isn't optimal, but I'm not sure how else to implement it. There's no API access and we have scrape the data. The search string could be an env variable. - `get_list` - Get the ticked boxes' name and url - `download` - Do the download. Using ByteIO together with existing FileStorage to save the files I'm not sure if more logic should be moved from the view to the instructions file like ```` # Get list of selected instructions selected_instructions = BrickInstructions("").get_list(request.form) # No instructions selected if not selected_instructions: return redirect(url_for('instructions.download')) # Loop over selected instructions and download them for href, filename in selected_instructions: BrickInstructions(f"{filename}.pdf").download(href) ````
Collaborator

Wow. I'm genuinely impressed by the changes. Good job 🤩!

set_num is gone for sets

As part of the first database schema upgrade, I upgraded the column name for a set with

  • set being the combination of number-version: 107-1
  • number being the set number: 107
  • version being the set version: 1

This separation is useful to properly sort by set number, otherwise the database doesn't compare number with numbers, but strings with strings, and it is a bit dumb 😂.
To make your changes consistent with the rest of the code, I would rename all the set_num to set.

<input type="text" class="form-control" id="add-set" name="add-set" placeholder="107-1 or 1642-1 or ..." value="{{ request.args.get('set_num', '') }}">

<a class="list-group-item list-group-item-action" href="{{ url_for('instructions.download', set_num=item.fields.number) }}"><i class="ri-download-line"></i> Download instruction from Rebrickable</a>

Whitespaces in blank lines

I don't know which editor you are using but it doesn't seem to trim the blank line when you save. This leaves a lot of whitespaces in blank lines, which should be avoided.

For instance:

image

response.ok

In other parts of the code where I'm using requests, I'm checking response.ok rather than checking specifically for a code 200. I think you should use that to account for more return codes that are valid and not 200.

if response.status_code != 200:

if not response.ok:

Downloading files

From what I've read on the Internet, the most efficient way to grab a file is to request it as a stream and then copyfileobj it. Not sure how true this is but that's how I download the images, and even the CSV files.

# Grab the image
response = requests.get(url, stream=True)
if response.ok:
with open(path, 'wb') as f:
copyfileobj(response.raw, f)
else:
raise DownloadException('could not get image {id} at {url}'.format(
id=self.id(),
url=url,
))

request.args.get in the templates

<input type="text" class="form-control" id="add-set" name="add-set" placeholder="107-1 or 1642-1 or ..." value="{{ request.args.get('set_num', '') }}">

I would not use request.args.get directly in the template as it is an user supplied value. I think it would be better to grab it in the view, maybe parse it with RebrickableSet.parse_number (which I realize is a wrong name now 😅) and if it is not raising exception inject it to the template via the context. For instance

try:
    set = RebrickableSet.parse_number(request.args.get('set', '')))
except Exception:
    set = ''

    return render_template(
        ...,
        set=set
    )

Download instruction will be hidden if there are existing instructions

On a set card, where yo placed the Download link is inside the block if there are no instructions. Meaning that the link will disappear as soon as there is an instruction file.

<a class="list-group-item list-group-item-action" href="{{ url_for('instructions.download', set_num=item.fields.number) }}"><i class="ri-download-line"></i> Download instruction from Rebrickable</a>

I don't know if it is by design but I would move it outside of that block so that it is permanently available (it will need a new is_authenticated block as well).

There is no feedback on the download page if there are no instruction found

For instance if I type 1642 I'm back to the same page.

Object-oriented code is not easy

I'm seeing some confusion in the use of self, and I think you want too hard to make the functions part of the class 😂.

Your find_instructions and get_list functions are not using any property of the object, and the url_for_instructions one is trying too hard to copy the one from the sets, but it is used only once. If you really want this code to be part of the BrickInstructions class, you can make them static methods, which do not require to instantiate an object, and does not use self.

download is correct as a instance function since it uses self.path

I would also consolidate them since they are not called more than once.

Finding instructions to download

I wouldn't worry too much about it being optimal or not. Does it work for you? If yes then it is a good first version 🙂. Wait for any bug report to improve upon your code, don't be afraid to release things that do work 👍.

Wow. I'm genuinely impressed by the changes. Good job 🤩! ### `set_num` is gone for sets As part of the first database schema upgrade, I upgraded the column name for a set with - `set` being the combination of `number`-`version`: `107-1` - `number` being the set number: `107` - `version` being the set version: `1` This separation is useful to properly sort by set number, otherwise the database doesn't compare number with numbers, but strings with strings, and it is a bit dumb 😂. To make your changes consistent with the rest of the code, I would rename all the `set_num` to `set`. https://gitea.baerentsen.space/FrederikBaerentsen/BrickTracker/src/commit/52f73d5bf98b0f4bf62b967ac16db918f905db06/templates/instructions/download.html#L14 https://gitea.baerentsen.space/FrederikBaerentsen/BrickTracker/src/commit/52f73d5bf98b0f4bf62b967ac16db918f905db06/templates/set/card.html#L51 ### Whitespaces in blank lines I don't know which editor you are using but it doesn't seem to trim the blank line when you save. This leaves a lot of whitespaces in blank lines, which should be avoided. For instance: ![image](/attachments/176625fe-7bd0-41a8-a379-9859da03f20e) ### `response.ok` In other parts of the code where I'm using `requests`, I'm checking `response.ok` rather than checking specifically for a code `200`. I think you should use that to account for more return codes that are valid and not `200`. https://gitea.baerentsen.space/FrederikBaerentsen/BrickTracker/src/commit/52f73d5bf98b0f4bf62b967ac16db918f905db06/bricktracker/instructions.py#L145 https://gitea.baerentsen.space/FrederikBaerentsen/BrickTracker/src/commit/52f73d5bf98b0f4bf62b967ac16db918f905db06/bricktracker/retired_list.py#L95 ### Downloading files From what I've read on the Internet, the most efficient way to grab a file is to `request` it as a `stream` and then `copyfileobj` it. Not sure how true this is but that's how I download the images, and even the CSV files. https://gitea.baerentsen.space/FrederikBaerentsen/BrickTracker/src/commit/52f73d5bf98b0f4bf62b967ac16db918f905db06/bricktracker/rebrickable_image.py#L60-L69 ### `request.args.get` in the templates https://gitea.baerentsen.space/FrederikBaerentsen/BrickTracker/src/commit/52f73d5bf98b0f4bf62b967ac16db918f905db06/templates/instructions/download.html#L14 I would not use `request.args.get` directly in the template as it is an user supplied value. I think it would be better to grab it in the view, maybe parse it with `RebrickableSet.parse_number` (which I realize is a wrong name now 😅) and if it is not raising exception inject it to the template via the context. For instance ```python try: set = RebrickableSet.parse_number(request.args.get('set', ''))) except Exception: set = '' return render_template( ..., set=set ) ``` ### Download instruction will be hidden if there are existing instructions On a set card, where yo placed the Download link is inside the block if there are no instructions. Meaning that the link will disappear as soon as there is an instruction file. https://gitea.baerentsen.space/FrederikBaerentsen/BrickTracker/src/commit/52f73d5bf98b0f4bf62b967ac16db918f905db06/templates/set/card.html#L51 I don't know if it is by design but I would move it outside of that block so that it is permanently available (it will need a new `is_authenticated` block as well). ### There is no feedback on the download page if there are no instruction found For instance if I type `1642` I'm back to the same page. ### Object-oriented code is not easy I'm seeing some confusion in the use of `self`, and I think you want too hard to make the functions part of the class 😂. Your `find_instructions` and `get_list` functions are not using any property of the object, and the `url_for_instructions` one is trying too hard to copy the one from the sets, but it is used only once. If you really want this code to be part of the `BrickInstructions` class, you can make them static methods, which do not require to instantiate an object, and does not use `self`. `download` is correct as a instance function since it uses `self.path` I would also consolidate them since they are not called more than once. ### Finding instructions to download I wouldn't worry too much about it being optimal or not. Does it work for you? If yes then it is a good first version 🙂. Wait for any bug report to improve upon your code, don't be afraid to release things that do work 👍.
Collaborator

I made two commits after yours to illustrate precisely all that #50 😂

It adds the feedback I was missing as well

image
image

I made two commits after yours to illustrate precisely all that #50 😂 It adds the feedback I was missing as well ![image](/attachments/af0d63bd-4793-46bd-a26f-ea83006d61c0) ![image](/attachments/07357ad1-aaca-4828-9604-a78c62ec4de8)
Collaborator

File naming and detection

For the file naming, maybe the easiest thing to do would be to simply download the files with the alt names that you are extracting (without stripping the 'Lego build instructions' and make the BrickInstructions object a bit smarter in the way it detects sets in filename.

It would be easier to add all the variants here:

# Extract the set number
if self.allowed:
# Normalize special chars to improve set detection
normalized = self.name.replace('_', '-')
normalized = normalized.replace(' ', '-')
splits = normalized.split('-', 2)
if len(splits) >= 2:
try:
# Trying to make sense of each part as integers
int(splits[0])
int(splits[1])
self.set = '-'.join(splits[:2])
except Exception:
pass

And maybe allow people to define specific prefixes they may have in a configuration variable?

### File naming and detection For the file naming, maybe the easiest thing to do would be to simply download the files with the `alt` names that you are extracting (without stripping the 'Lego build instructions' and make the `BrickInstructions` object a bit smarter in the way it detects sets in filename. It would be easier to add all the variants here: https://gitea.baerentsen.space/FrederikBaerentsen/BrickTracker/src/commit/52f73d5bf98b0f4bf62b967ac16db918f905db06/bricktracker/instructions.py#L52-L68 And maybe allow people to define specific prefixes they may have in a configuration variable?
Author
Owner

Wow. I'm genuinely impressed by the changes. Good job 🤩!

Thank you 😃 I learned a lot of good things.

set_num is gone for sets

As part of the first database schema upgrade, I upgraded the column name for a set with

  • set being the combination of number-version: 107-1
  • number being the set number: 107
  • version being the set version: 1

This separation is useful to properly sort by set number, otherwise the database doesn't compare number with numbers, but strings with strings, and it is a bit dumb 😂.
To make your changes consistent with the rest of the code, I would rename all the set_num to set.

This makes sense. Thanks for the heads up.

Whitespaces in blank lines

I don't know which editor you are using but it doesn't seem to trim the blank line when you save. This leaves a lot of whitespaces in blank lines, which should be avoided.

I'm using VSCode for the first time. I found the setting to trim whitespace. Hopefully that should fix it. Normally i use Vim, but with this many files i wanted some GUI.

response.ok

In other parts of the code where I'm using requests, I'm checking response.ok rather than checking specifically for a code 200. I think you should use that to account for more return codes that are valid and not 200.

Gotcha. Makes sense.

Downloading files

From what I've read on the Internet, the most efficient way to grab a file is to request it as a stream and then copyfileobj it. Not sure how true this is but that's how I download the images, and even the CSV files.

Alright. Never knew there was multiple ways 😅 Learning something new every day.

request.args.get in the templates

I would not use request.args.get directly in the template as it is an user supplied value. I think it would be better to grab it in the view, maybe parse it with RebrickableSet.parse_number (which I realize is a wrong name now 😅) and if it is not raising exception inject it to the template via the context. For instance

I'm not completely sure I understand what you mean, but i see the changes in the code and that makes sense for me at least!

Download instruction will be hidden if there are existing instructions

On a set card, where yo placed the Download link is inside the block if there are no instructions. Meaning that the link will disappear as soon as there is an instruction file.

This was by design, as the upload button also disappears but I wasn't completely sure I liked it that way, so I'm happy with the change you made.

There is no feedback on the download page if there are no instruction found

For instance if I type 1642 I'm back to the same page.

Yeah, this was one of the things i wasn't sure how to properly implement. I like the feedback on the /add page, but didn't have the time to fully understand the implementation yesterday.

Object-oriented code is not easy

I'm seeing some confusion in the use of self, and I think you want too hard to make the functions part of the class 😂.

Yeah, it's a decade since i had object-oriented programming in college, so I'm rusty. Mainly been doing python/bash scripts since.

Your find_instructions and get_list functions are not using any property of the object, and the url_for_instructions one is trying too hard to copy the one from the sets, but it is used only once. If you really want this code to be part of the BrickInstructions class, you can make them static methods, which do not require to instantiate an object, and does not use self.

Oh i remember the term static but that's something I gotta do some reading on together with OOP.

download is correct as a instance function since it uses self.path

I would also consolidate them since they are not called more than once.

Nice, your commit makes a lot of sense.

Finding instructions to download

I wouldn't worry too much about it being optimal or not. Does it work for you? If yes then it is a good first version 🙂. Wait for any bug report to improve upon your code, don't be afraid to release things that do work 👍.

😅 Thanks.

Your PR has some merge conflicts, so I'm not sure how to properly merge it. Gotta read up on that. Would it help to add you as collaborator so you aren't depending on me to approve the merge requests?

I'm gonna try and read up on the socket implementation, as i really want to understand how the progressbar is made.

> Wow. I'm genuinely impressed by the changes. Good job 🤩! Thank you 😃 I learned a lot of good things. > ### `set_num` is gone for sets > > As part of the first database schema upgrade, I upgraded the column name for a set with > > - `set` being the combination of `number`-`version`: `107-1` > - `number` being the set number: `107` > - `version` being the set version: `1` > > This separation is useful to properly sort by set number, otherwise the database doesn't compare number with numbers, but strings with strings, and it is a bit dumb 😂. > To make your changes consistent with the rest of the code, I would rename all the `set_num` to `set`. This makes sense. Thanks for the heads up. > ### Whitespaces in blank lines > > I don't know which editor you are using but it doesn't seem to trim the blank line when you save. This leaves a lot of whitespaces in blank lines, which should be avoided. > I'm using VSCode for the first time. I found the setting to trim whitespace. Hopefully that should fix it. Normally i use Vim, but with this many files i wanted some GUI. > ### `response.ok` > > In other parts of the code where I'm using `requests`, I'm checking `response.ok` rather than checking specifically for a code `200`. I think you should use that to account for more return codes that are valid and not `200`. > Gotcha. Makes sense. > ### Downloading files > > From what I've read on the Internet, the most efficient way to grab a file is to `request` it as a `stream` and then `copyfileobj` it. Not sure how true this is but that's how I download the images, and even the CSV files. Alright. Never knew there was multiple ways 😅 Learning something new every day. > ### `request.args.get` in the templates > > I would not use `request.args.get` directly in the template as it is an user supplied value. I think it would be better to grab it in the view, maybe parse it with `RebrickableSet.parse_number` (which I realize is a wrong name now 😅) and if it is not raising exception inject it to the template via the context. For instance I'm not completely sure I understand what you mean, but i see the changes in the code and that makes sense for me at least! > ### Download instruction will be hidden if there are existing instructions > > On a set card, where yo placed the Download link is inside the block if there are no instructions. Meaning that the link will disappear as soon as there is an instruction file. > This was by design, as the upload button also disappears but I wasn't completely sure I liked it that way, so I'm happy with the change you made. > ### There is no feedback on the download page if there are no instruction found > > For instance if I type `1642` I'm back to the same page. Yeah, this was one of the things i wasn't sure how to properly implement. I like the feedback on the `/add` page, but didn't have the time to fully understand the implementation yesterday. > ### Object-oriented code is not easy > > I'm seeing some confusion in the use of `self`, and I think you want too hard to make the functions part of the class 😂. Yeah, it's a decade since i had object-oriented programming in college, so I'm rusty. Mainly been doing python/bash scripts since. > Your `find_instructions` and `get_list` functions are not using any property of the object, and the `url_for_instructions` one is trying too hard to copy the one from the sets, but it is used only once. If you really want this code to be part of the `BrickInstructions` class, you can make them static methods, which do not require to instantiate an object, and does not use `self`. Oh i remember the term `static` but that's something I gotta do some reading on together with OOP. > > `download` is correct as a instance function since it uses `self.path` > > I would also consolidate them since they are not called more than once. Nice, your commit makes a lot of sense. > ### Finding instructions to download > > I wouldn't worry too much about it being optimal or not. Does it work for you? If yes then it is a good first version 🙂. Wait for any bug report to improve upon your code, don't be afraid to release things that do work 👍. > > 😅 Thanks. Your PR has some merge conflicts, so I'm not sure how to properly merge it. Gotta read up on that. Would it help to add you as collaborator so you aren't depending on me to approve the merge requests? I'm gonna try and read up on the socket implementation, as i really want to understand how the progressbar is made.
Collaborator

I had my work based on master and not the instructions branch that's why the history were mismatching. It should be better now.
I will provide the implementation for the socket so that you have another example. Maybe that will be easier to understand how it works?
As I said the current socket code is only made to load sets from Rebrickable and it will require quite some code to make it work with downloads, or to make it generic.

I had my work based on `master` and not the `instructions` branch that's why the history were mismatching. It should be better now. I will provide the implementation for the socket so that you have another example. Maybe that will be easier to understand how it works? As I said the current socket code is only made to load sets from Rebrickable and it will require quite some code to make it work with downloads, or to make it generic.
Author
Owner

Ah i see. Merged it now.

That would be a great help!

Ah i see. Merged it now. That would be a great help!
Collaborator

Something like that I presume (with the progress counter not bugged):

image
image
image
image

Something like that I presume (with the progress counter not bugged): ![image](/attachments/d76bc1fc-d88d-464e-8363-baa7bc59c712) ![image](/attachments/ab211b1c-03fa-481b-ab59-8ad8b4e815ef) ![image](/attachments/67c9db5d-0ded-4ff5-8a95-61b34ffd67e4) ![image](/attachments/b5052ae3-b8ef-4a51-86f6-6ddf060c36ec)
Author
Owner

The PR has been merged and I'll update the docker image tomorrow to v1.2.0.


Peeron instructions (PICSL) as an Alternative Source

This feature is currently a work in progress.

If Rebrickable fails to locate the instructions for a set, the system will suggest using Peeron as an alternative source.

Example Flow

  1. Search Rebrickable
    Searching an old set on Rebrickable will fail.

  2. Peeron Search Results
    When searching Peeron, a selection of available pages will be displayed:

  3. Selecting and Downloading Instructions
    Users can select the desired pages, after which a PDF will be created and downloaded:

  4. Error Handling

    • If Peeron fails to find the set, an error message will be displayed.
    • If Peeron finds the set but cannot locate the instructions, an error message will also be shown.

PDF creation

if img_response.ok:

    img_data = img_response.content
    image = Image.open(BytesIO(img_data))
    width, height = image.size

    # Add the image to the PDF
    pdf.add_page(format=(width * 0.264583, height * 0.264583))  # Convert pixels to mm (1 px = 0.264583 mm)
    pdf.image(BytesIO(img_data), x=0, y=0, w=width * 0.264583, h=height * 0.264583)

# Save the PDF
pdf.output(filename)

This unfortunately needs three extra libraries:

from io import BytesIO
from PIL import Image
from fpdf import FPDF

To do

The basic functionality is working, but the following tasks remain:

  • Add the necessary environment variables (various urls and filename template)
  • Update the socket configuration to support different download urls (or add a new download method to the socket)

Proof of concept code (in a self-contained python script) can be seen here https://gitea.baerentsen.space/FrederikBaerentsen/InstructionsDownloader/src/branch/main/dl_peeron.py. This is before i changed to using BytesIO stream without saving the image locally.

The PR has been merged and I'll update the docker image tomorrow to v1.2.0. --- ## Peeron instructions (PICSL) as an Alternative Source This feature is currently a work in progress. If Rebrickable fails to locate the instructions for a set, the system will suggest using Peeron as an alternative source. ### Example Flow 1. **Search Rebrickable** Searching an old set on Rebrickable will fail. ![](https://xbackbone.baerentsen.space/LaMU8/NURiXIFE79.png/raw) 2. **Peeron Search Results** When searching Peeron, a selection of available pages will be displayed: ![](https://xbackbone.baerentsen.space/LaMU8/fIWigifu76.png/raw) 3. **Selecting and Downloading Instructions** Users can select the desired pages, after which a PDF will be created and downloaded: ![](https://xbackbone.baerentsen.space/LaMU8/tANikUlo76.png/raw) 4. **Error Handling** - If Peeron fails to find the set, an error message will be displayed. - If Peeron finds the set but cannot locate the instructions, an error message will also be shown. ### PDF creation ```` if img_response.ok: img_data = img_response.content image = Image.open(BytesIO(img_data)) width, height = image.size # Add the image to the PDF pdf.add_page(format=(width * 0.264583, height * 0.264583)) # Convert pixels to mm (1 px = 0.264583 mm) pdf.image(BytesIO(img_data), x=0, y=0, w=width * 0.264583, h=height * 0.264583) # Save the PDF pdf.output(filename) ```` This unfortunately needs three extra libraries: ```` from io import BytesIO from PIL import Image from fpdf import FPDF ```` ### To do The basic functionality is working, but the following tasks remain: - Add the necessary environment variables (various urls and filename template) - Update the socket configuration to support different download urls (or add a new download method to the socket) Proof of concept code (in a self-contained python script) can be seen here https://gitea.baerentsen.space/FrederikBaerentsen/InstructionsDownloader/src/branch/main/dl_peeron.py. This is before i changed to using BytesIO stream without saving the image locally.
Collaborator

(You should probably make a brand new issue for this upgrade of the feature 😂)

(You should probably make a brand new issue for this upgrade of the feature 😂)
Author
Owner

(You should probably make a brand new issue for this upgrade of the feature 😂)

Probably a good idea. I'll open a new one 👍

> (You should probably make a brand new issue for this upgrade of the feature 😂) Probably a good idea. I'll open a new one 👍
Sign in to join this conversation.
No Milestone
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: FrederikBaerentsen/BrickTracker#45
No description provided.