shirblc / pocketmagstopdf

Download pocketmags magazines in PDF format from the HTML5 reader

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Download pocketmags magazines in PDF format from the HTML5 reader.

PLEASE USE THIS SCRIPT RESPONSIBLY. THE MAGAZINE PUBLISHING INDUSTRY RELIES HEAVILY ON INCOME FROM SALES WITH VERY SLIM PROFIT MARGINS.

Acknowledgements:

This is a modified version of the GitHub Gist called pmdown.py written by the GitHub user rjw57. I would have contributed my changes to the original but alas it is only a Gist, not a GitHub Repository.

With thanks to:

  • rjw57 for the original pmdown.py Python script.
  • bani6809 for revealing in the comments that the "high" and "extrahigh" quality image urls end in bin not jpg.
  • shirblc for replacing my collection of Python print statements with proper Python logging.

NB: I have only been able to test this on the small number of magazines I have purchased on pocketmags.com

Feature Additions:

14/07/2022

  • Add the option to enable downloading of magazines in the elusive "high" quality format (only when --quality=high is used, otherwise the default is "mid").
  • Added the option to insert a custom title into the generated PDF's metadata to replace the default of "untitled.pdf".

13/08/2022

  • Add the option to specify a range of pages to download, rather than the whole magazine.
  • Add the option to save images to a separate directory in addition to generating the PDF.
  • And the option to set a delay between downloading pages in case of any server-imposed rate-limiting.

30/09/2022

  • Add the option to enable downloading of magazines in the Holy-Grail "original" format (only when --quality=original is used, otherwise the default is "mid").
  • Add options to alter the verbiage level of the program's output:
    • --quiet suppresses all output except warnings and errors.
    • No option given will present a normal level of informational output.
    • --debug prints comprehensive PDF-related information.
  • Add the option to hide the User UUID watermark that is inserted on each page of the PDF when --quality=original is used.

09/12/2022

  • Add proper Python Logger support (implemented by shirblc)

19/03/2023

  • Add the option to enable downloading of magazines in the newly-discovered "extrahigh" quality format (only when --quality=extrahigh is used, otherwise the default is "mid").

Usage:

pocketmagstopdf.py (-h | --help)
pocketmagstopdf.py [options] <pdf> <url>

Options:

-h, --help                  Print brief usage summary.

--quality=QUALITY           Set magazine download quality.
                            Choose from extralow, low, mid, high, extrahigh or original. (Optional)
                            [default: mid]

--dpi=DPI                   Set image resolution in dots per inch. (Optional)
                            Not used with '--quality=original'.
                            [default: 150]

--title=TITLE               Set magazine title in the PDF metadata. (Optional)
                            Not used with '--quality=original'.
                            default value is the filename with;
                                - underscores replaced with spaces
                                - the file extension removed

--range-from=PAGE-FROM      Define a portion of the magazine to download, starting from this page number. (Optional)
                            Downloads from the beginning of the magazine - page 1 - if absent.
                            [default: 1]

--range-to=PAGE-TO          Define a portion of the magazine to download, ending on this page number. (Optional)
                            Downloads to the end of the magazine if absent.
                            [default: 999]

--delay=DELAY               Set the time in seconds to wait between downloading each page of the magazine. (Optional)
                            There is no delay if absent. The value of the delay may be integer or decimal.
                            Used both whenenever probing for the last valid page number of the magazine and
                            between downloading each individual page for all quality settings except 'original'.
                            [default: 0]

--save-images               Save the downloaded JPEG images of the magazine pages to a subdirectory with the same
                            name as the magazine in addition to generating the PDF of the magazine.
                            Not used with '--quality=original'.
                            [default: False]

--image-subdir-prefix=PFX   If --save-images=yes then prefix name of the subdirectory the images are saved to with
                            this string. Blank by default. (Optional)
                            Not used with '--quality=original'.
                            [default: ]

--image-subdir-suffix=SFX   If --save-images=yes then suffix name of the subdirectory the images are saved to with
                            this string. Blank by default. (Optional)
                            Not used with '--quality=original'.
                            [default: ]

--uuid=UUID                 Specifies the User UUID to use to download the PDF when '--quality=original' is used
                            and --uuid-randomise is not used.
                            Read the 'Notes' section below for details of how to find it. (Optional/Required)
                            Only used with '--quality=original'.
                            [default: None]

--uuid-randomise            Uses a random UUID to download the PDF when '--quality=original' is specified. (Optional)
                            [default: False]

--uuid-hide                 Hides the User UUID watermark on each page of the PDF by making it transparent.
                            This option is overridden by '--uuid-destroy'.
                            Only used with '--quality=original' as watermark not present on lower quality downloads.
                            [default: False]

--uuid-destroy              Completely wipes the User UUID watermark from each page of the PDF. (Experimental)
                            This option overrides by '--uuid-hide'.
                            Only used with '--quality=original' as watermark not present on lower quality downloads.
                            [default: False]

--timestamp-change          Alters the timestamp within the downloaded PDF.
                            Only used with '--quality=original'.
                            [default: False]

--quiet                     Suppress printing of all output except warning and error messages.
                            [default: False]

--debug                     Print extra output to aid debugging of the program.
                            Setting both '--quiet' and '--debug' is contradictory
                            If this happens, a warning is issued and the debug setting overrides the quiet setting.
                            [default: False]

<pdf>                       Save output to this file. (Required)
<url>                       A URL to one image from the magazine. (Required)

Examples:

pocketmagstopdf.py --quality=extrahigh --delay=2 --title="My Magazine, Issue 73, October 2022" my_magazine.pdf https://mcdatastore.blob.core.windows.net/mcmags/<STORAGE_BUCKET_UUID>/<ISSUE_UUID>/extralow/0000.jpg

pocketmagstopdf.py --quality=original --delay=0.5 --uuid-hide --uuid=<USER_UUID> my_magazine.pdf https://mcdatastore.blob.core.windows.net/mcmags/<STORAGE_BUCKET_UUID>/<ISSUE_UUID>/extralow/0000.jpg

Notes:

PLEASE USE THIS SCRIPT RESPONSIBLY. THE MAGAZINE PUBLISHING INDUSTRY RELIES HEAVILY ON INCOME FROM SALES WITH VERY SLIM PROFIT MARGINS.

URLs for pocketmags images and User UUIDs can be found by using the HTML 5 reader and right-clicking on a page and selecting "inspect element". Look for URLs of the form:

https://mcdatastore.blob.core.windows.net/mcmags/<uuid1>/<uuid2>/extralow/<num>.jpg

where <uuid{1,2}> are strings of letters and numbers with dashes separating them and is some 4-digit number.

The User UUID required for downloading the magazine when '--quality=original' can be found by searching the HTML for the text "userGuid:" and copying the hexadecimal value that follows it without the surrounding single quote characters.

Support this Project:

Buy Me a Coffee at ko-fi.com

About

Download pocketmags magazines in PDF format from the HTML5 reader

License:Creative Commons Zero v1.0 Universal


Languages

Language:Python 100.0%