alephpi / ocr

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OCR workflow

procedure

  • Given (scanned) pdf
    1. pagination
    2. convert pdf to image
    3. OCR image to text
    4. merge texts in one

dependency: pdftk, ghostscript, tesseract

usage

./run.sh <input_pdf> -l <language>

About


Languages

Language:Shell 100.0%