pd3f

There are 2 repositories under pd3f topic.

pd3f
pd3f / pd3f
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
extract-text language-model machine-learning ocr parsr pd3f pdf pdf-to-text pipeline python text-extraction
Language:HTML 286
pd3f / dehyphen
📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF
dehyphenation flair flair-embeddings german hyphen hyphens nlp pd3f pdf python
Language:Python 37
pd3f / pd3f-core
📑 Python Package to reconstruct the original continuous text from PDFs with language models
dehyphenation language-model machine-learning pd3f pdf text-extraction
Language:Jupyter Notebook 34
pd3f / pd3-flair
Flair's language models without unnecessary dependencies
pd3f
Language:Python 3
pd3f / pd3f-dataset-bmjv
Dataset of (mostly German) PDFs used to develop pd3f
german pd3f pdf
Language:Python 1
pd3f / pd3f-results
Results with pd3f on some PDF datasets
pd3f
Language:Jupyter Notebook 1
pd3f / pd3f.com
📝 Website to advertise & document pd3f
hugo-academic pd3f
Language:JavaScript 1

pd3f / pd3f