John's repositories
docnet
DocNET is as fast PDF editing and reading library for modern .NET applications
finetune
Scikit-learn style model finetuning for NLP
go-pdfium-render
A Go library that uses pdfium (via cgo) to render pdfs to images
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
paperless-ng
A supercharged version of paperless: scan, index and archive all your physical documents
pdf-anonymizer
A script to anonymize PDFs
pdf-corpora
An index of PDF-centric corpora
pdf-js-csv
Exploring extracting tables from a PDF to CSV using PDF.JS
pdf2json
A PDF file parser that converts PDF binaries to text based JSON, powered by a fork of PDF.JS
pdfcpu
A PDF processor written in Go.
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.