shahmohamadi / PDF_TEXT_OCR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This code turns a scanned Farsi pdf document into images and then converts it to a text file using the powerful Tesseract Open Source OCR Engine developed by Google. you can use this for any other language by changing "lang='fas'" parameter on pytesseract.image_to_string function.

About


Languages

Language:Jupyter Notebook 100.0%