OCR workflow procedure Given (scanned) pdf pagination convert pdf to image OCR image to text merge texts in one dependency: pdftk, ghostscript, tesseract usage ./run.sh <input_pdf> -l <language>