modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.

Home Page:https://github.com/modesty/pdf2json

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Content output gives line break on dash -

dhenson02 opened this issue · comments

this is referring to -c, --content option - it's experimental but still needs bug reports too

Everywhere a dash character - appears in the document, it is replaced by a line break before and after itself.

To recreate, I used http://static.e-publishing.af.mil/production/1/af_sg/publication/afi41-210/afi41-210.pdf and command node pdf2json.js -f /home/user/afi...pdf -o /home/user -c on Debian.

Example:

ORIGINAL

If the data is stored on a facility-shared computer drive, the drive or data folder must be locked so unauthorized users are prevented from gaining access to the information.

OUTPUT

If the data is stored on a facility
-
shared computer drive, the drive or ...

Didn't see the issue already listed but if I'm duplicating someone or just using it incorrectly, please feel free to close.

PS - thank you so very much for this code - it's exactly what I've been looking for.

fixed in v1.0.9