Feature: Switch to turn off page label support

Question

Feature: Switch to turn off page label support

lvsass opened this issue 2 years ago · comments

Hi!

For my specific use case it would be great to have an option to have pdfminer ignore page labels.

At the moment I am using a script that, in the resulting markdown file, adds links to the specific page in the PDF, like so:

[Page 2](<file.pdf#page=2>)

Obviously, the page labels often don't correspond to the actual page number in the file, which would make this type of switch useful.

Andrew Baumann · Answer 1 · Wed Jan 25 2023 19:40:16 GMT+0800 (China Standard Time)

I agree that ignoring labels makes sense (they might be nonsense), but I'm pretty nervous about your script. Trying to parse and modify.the markdown sounds pretty fragile. Wouldn't you be better off implementing what you need with the json output, or maybe a custom formatter?

lvsass · Answer 2 · Wed Jan 25 2023 19:58:38 GMT+0800 (China Standard Time)

but I'm pretty nervous about your script. Trying to parse and modify.the markdown sounds pretty fragile.

You're not wrong, it is fragile and just a means to automate using text substitution what I would do later by hand, so far it has worked well.

Wouldn't you be better off implementing what you need with the json output, or maybe a custom formatter?

To be honest, I had never looked at the JSON output before – it looks promising but in order to turn this into a script I would first have to learn JSON… And I'm not even sure I know what you mean by custom formatter.

My non-existent coding skills aside, I feel like some kind of implementation of page numbers as they appear in the file would be sensible – if not a switch to ignore them, maybe a switch to show both page labels and "regular" page numbers.

Andrew Baumann · Answer 3 · Thu Jan 26 2023 18:22:10 GMT+0800 (China Standard Time)

Yes, I agree with the suggestion of a switch to ignore page labels. I'll get around to it ... eventually :)

lvsass · Answer 4 · Thu Jan 26 2023 22:13:42 GMT+0800 (China Standard Time)

Sounds good!