Convert pdf documents into json object. It provides additional methods for searching within the document using regex and returns the found text with bounding box information
from serialize_pdf import serialize_pdf pdf = serialize_pdf.serialize('document.pdf')
# find value regular expression in the document kvs = pdf.get_kv(key='Net sales',val_regex = 'Net sales.{1,20}[0-9,]*')
#
- Free software: MIT license
- Documentation: https://serialize-pdf.readthedocs.io.
- TODO
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template. PDF Serialize - https://github.com/JoshData/pdf-diff