uzairkabeer1 / Python-PDF-Scraper

Repository from Github https://github.comuzairkabeer1/Python-PDF-ScraperRepository from Github https://github.comuzairkabeer1/Python-PDF-Scraper

Python-PDF-Scraper

Previous version

In this file, I used pdfQuery library and with the help of pdf->xml. I get the specific pdf data.

Newer version

This version used PyMuPDF and fitz library to able to extract the hightlighted text from pdf. it will require no xml conversion and is alot faster and fairly more accurate. Before running it, run the command: pip install fitz PyMuPDF

About


Languages

Language:Python 100.0%