mara004's repositories
backports.cached_property
Python 3.8 functools.cached_property backport to python 3.6
benchmarks
Benchmarking PDF libraries
cpython
The Python programming language
ctypesgen
Pure-python wrapper generator for ctypes
deskew
Library used to deskew a scanned document
doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
ImageMagick
🧙♂️ ImageMagick 7
camelot
A Python library to extract tabular data from PDFs
docling
Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
JSPyBridge
🌉. Bridge to interoperate Node.js and Python
nv-ingest
NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents into metadata and text to embed into retrieval systems.
OCR-Form-Tools
A set of tools to use in Microsoft Azure Form Recognizer and OCR services.
pdfium-binaries
📰 Binary distribution of PDFium
pdfium-binaries-feedstock
Repack of pdfium2 binaries for macOS, linux and Windows.
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
pdftext
Extract structured text from pdfs quickly
pikepdf
A Python library for reading and writing PDF, powered by qpdf
pypdfium2
Python bindings to PDFium
scantailor-libs-build
Building scantailor and its dependencies
spek
Acoustic spectrum analyser
test_workflows
Personal experiments with GH workflows
yt-dlp
A youtube-dl fork with additional features and fixes