pdf ocr ubuntu python vision-api tesseract tesseract-ocr tesseract-ocr-api txt

PDF to TXT

Python code to do OCR recognition of a PDF file and export text to TXT file.

LocalOCR: based on Tesseract OCR
CloudOCR: based on Google Vision API

Setup for LocalOCR on Ubuntu

apt-get install python-pyocr python-wand imagemagick
apt-get install libleptonica-dev tesseract-ocr-dev
apt-get install tesseract-ocr-ita
pip install -r requirements.txt

Setup CloudOCR on Ubuntu

Install Google Cloud SDK

apt-get install pdfimages google-cloud-sdk-app-engine-python
pip install requirements.txt

About

Python code to read text from a PDF file (OCR).

https://github.com/lucab85/PDFtoTXT

pdf ocr ubuntu python vision-api tesseract tesseract-ocr tesseract-ocr-api txt

MIT License

Languages

Language:Python 100.0%