pdf-to-text streamlit streamlit-webapp text-extraction python ocr ocr-python ocr-text-reader pdf

PDF to Text

PDF text data extraction app that takes a PDF document as input and returns either a txt file that contains all pages or a compressed folder of txt files representing the document pages. OCR can also be enabled for scanned docoments.

How does it worK?

flowchart LR

A[PDF] --> |text conversion / OCR| B(Text)
B --> |Option 1| D[txt file]
B --> |Option 2| E[ZIP folder of txt files for pages]

Upload your PDF.
Enable OCR (for scanned documents).
Select the PDF language.
Download your output file (zip/txt).

How to support the project

You can help support the project through feedback and/or buy me coffee.

About

PDF text data extraction web app with OCR for scanned documents

https://share.streamlit.io/nainiayoub/pdf-text-data-extractor/main/app.py

pdf-to-text streamlit streamlit-webapp text-extraction python ocr ocr-python ocr-text-reader pdf

Languages

Language:Python 100.0%