ashutoshvarma / pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

Home Page:https://pyxpdf.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pyxpdf

pyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources.

docs Read the Docs
tests Azure DevOps builds (branch) Travis (.com) Codecov
package PyPI PyPI - Python Version PyPI - Wheel PyPI - Downloads
license GitHub

Features

  • Almost x20 times faster than pure python based pdf parsers (see Speed Comparison)
  • Extract text while maintaining original document layout (best possible)
  • Support almost all PDF encodings, CMaps and predefined CMaps.
  • Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.
  • Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.
  • No explict dependencies (except optional ones, see Installation)
  • Thread Safe

More Information

License

pyxpdf is licensed under the GNU General Public License (GPL), version 2 or 3. See the LICENSE

Credits

About

Fast and memory-efficient Python PDF Parser based on xpdf sources

https://pyxpdf.readthedocs.io/

License:Other


Languages

Language:Cython 69.7%Language:Python 22.8%Language:C++ 5.1%Language:Makefile 1.3%Language:Shell 1.1%Language:C 0.0%