serialize_pdf

Convert pdf documents into json object. It provides additional methods for searching within the document using regex and returns the found text with bounding box information

from serialize_pdf import serialize_pdf pdf = serialize_pdf.serialize('document.pdf')

# find value regular expression in the document kvs = pdf.get_kv(key='Net sales',val_regex = 'Net sales.{1,20}[0-9,]*')

#

Free software: MIT license
Documentation: https://serialize-pdf.readthedocs.io.

Features

TODO

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template. PDF Serialize - https://github.com/JoshData/pdf-diff

About

Convert pdf document into json object using pdftotext with bbox layout info. The returned object has simple methods to search pdf document based on key and layout position

MIT License

Languages

Language:Python 85.4%Language:Makefile 14.6%