lucasrla / remarks

Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

incorrect order of pdf pages with --modified_pdf

daleonpz opened this issue · comments

It generates the pdf with the annotations but the pages seem to be in any order. But when I use --combined-pdf it works. The annotations are there and the pages are in order. I tried it with two books, one of them is a two columns book and the other one is a one column book.

Check the order of the page indexes:

Book Writers (2017, Createspace Independent Publishing Platform) - libgen.lc.pdf"
PDF in-device directory: .
-------PAGE IDX #104
-------PAGE IDX #114
-------PAGE IDX #8
-------PAGE IDX #132
-------PAGE IDX #26
-------PAGE IDX #43
-------PAGE IDX #79
-------PAGE IDX #88
-------PAGE IDX #14
-------PAGE IDX #107
-------PAGE IDX #119
-------PAGE IDX #115
-------PAGE IDX #52

Probably it should be sorted before saving if we save the order of pages in an array.
Maybe something like this:

 pages_order = []
 ....
#  at remarks.py: 180
   if modified_pdf:
        mod_pdf.insertPDF(ann_doc, start_at=-1)
        pages_order.append(page_idx)

# at remark.py: 203
if modified_pdf:
          mod_pdf = _sort_document( mod_pdf, pages_order) 
          mod_pdf.save(f"{output_dir}/{name} _remarks-only.pdf")
          mod_pdf.close()

or put everything together and delete the blank pages after.

for example
at remarks.py: 180

 if modified_pdf:
      mod_pdf.insertPDF(ann_doc, start_at=page_idx)

and
at remarks.py:203

if modified_pdf:
         l = list(range(mod_pdf.pageCount))          # list of all pages
         for i in l:
                 if not doc.getPageText(i)        # if no text on page number i ...
                            l.remove(i)                   # delete that page from list
          mod_pdf.select(l)                           # select remaining pages from the PDF
          mod_pdf.save(f"{output_dir}/{name} _remarks-only.pdf")
          mod_pdf.close()

Hey @daleonpz, thanks for catching and fixing this! I have just merged your PR to master.