shahmohamadi / PDF_TEXT_OCR

This code turns a scanned Farsi pdf document into images and then converts it to a text file using the powerful Tesseract Open Source OCR Engine developed by Google. you can use this for any other language by changing "lang='fas'" parameter on pytesseract.image_to_string function.

About

Languages

Language:Jupyter Notebook 100.0%