There are 2 repositories under pdfminer topic.
Create a Gephi Citation Graph based on Text Analysis of PDFs from Zotero
Multiple and Large PDF Documents Text Extraction.
自动裁剪PDF图表中的白边 / Cut white bound in PDF figures automatically.
This Project is to create a tool which can parse the Resumes and transform them into our own templates
DouFinder: Script para pesquisa/alerta de termos no Diário Oficial da União (DOU).
An automatic translation tool for paper ( PDF => TXT, English => Chinese )
Scans a directory for IMRT QA results
An api using fastapi for extracting the text content of pdf using pdfminer. It also supports scanned images in pdf's by using tesseract and ocrmypdf.
OCR made for the specific use case of extracting Covid Info from Images, PDFs and Texts
Automate the case review on legal case documents.
A resume scanner for Applicant Tracking Systems (ATS) to assess the similarity between applicants' resumes and job descriptions
PDF parser using pdfminer and pytesseract for OCR support
A more complete example of programming with PDFMiner, which continues where the default documentation stops
CLI program for searching inside text and tables in PDF documents and displaying results in HTML.
PDFs are notoriously difficult to scrape. This program converts them to *.txt or *.html formats. The program has tested for Latin alphabets and Japanese.
Parses apart a PDF file into separate documents and then uses Natural Language Processing, Machine Learning models, and statistics to rank the documents by similarity to a single document.
This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.
This tool basically searches the given word in pdf file hierarchy. It searches one or more keywords in the hierarchy and generates an HTML report of it.
PDF Classifier for a Mortgage Company
Web scrapped to create Indian NER dataset, injected CONLL data with Indian data, fine-tuned BERT, weighted Fasttext for unsupervised KNN to classify reports, extracted data from PDFs
[2023-01] A python Flask API to extrat metadata and text from PDF files. Asynchronous tasks executed with a Celery queue and Redis workers. A SQLite storage managed by SqlAlchemy. Clean code with Flake8 and Isort. Coverage tested with Pytest-cov. See the documentation in the Readme.md and check the API contract with Swagger.
An app that checks drawings in the "Kornit" drawing template
Extract table from PDF document, Crop and Convert to JPG file
Image to Text with Flask application
In this project a user can upload his resume pdf and get to know about his strength and weakness, suggestions for improvement,finding the right domain,searching jobs based on the domain
Projet officiel : Conversion de fichiers PDF en fichiers JS adaptés pour react-pdf
Given a set of PDFs and the query, the most relevant pdf can be found with the help of TF-IDF. The code has not used any library to implement TF-IDF