pd3f

pd3f

Geek Repo

PDF text extraction pipeline: self-hosted, local-first and Docker-based

Home Page:https://pd3f.com

Twitter:@pd3f_

Github PK Tool:Github PK Tool

pd3f's repositories

pd3f

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

Language:HTMLLicense:AGPL-3.0Stargazers:256Issues:6Issues:21

dehyphen

πŸ“œ Dehyphenation of broken text (mainly German), i.e., extracted from a PDF

Language:PythonLicense:GPL-3.0Stargazers:34Issues:2Issues:4

pd3f-core

πŸ“‘ Python Package to reconstruct the original continuous text from PDFs with language models

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:30Issues:3Issues:3

pd3-flair

Flair's language models without unnecessary dependencies

Language:PythonLicense:NOASSERTIONStargazers:3Issues:2Issues:0

pd3f-dataset-bmjv

Dataset of (mostly German) PDFs used to develop pd3f

Language:PythonLicense:MITStargazers:1Issues:2Issues:0

pd3f-results

Results with pd3f on some PDF datasets

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:1Issues:2Issues:0

pd3f.com

πŸ“ Website to advertise & document pd3f

Language:JavaScriptLicense:MITStargazers:1Issues:2Issues:0