JSchoonmaker / PDF-Text-Extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PDF-Text-Extraction

Notebook showing 4 methods for extracting text from pdf files using the python packages PyPdf2, Pdfminer.six, PyMuPdf, and Grobid.

Levenshtein distance, cosine similarity, tf-idf similarity, and processing time are compared for the text output of each method.

About


Languages

Language:Jupyter Notebook 100.0%