alephpi / ocr

OCR workflow

procedure

Given (scanned) pdf
1. pagination
2. convert pdf to image
3. OCR image to text
4. merge texts in one

dependency: pdftk, ghostscript, tesseract

usage

./run.sh <input_pdf> -l <language>

Language:Shell 100.0%