azardilis / extract-text-pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

You need ell.traineddata (https://github.com/tesseract-ocr/tessdata/blob/master/ell.traineddata)
in the tessdata-dir 

> pdfimages anafentos_cyprob_traino4.pdf cyprob-page
> tesseract cyprob-page-001.ppm output -l ell --tessdata-dir ~/Downloads/


Then spellcheck with automatic replacement of suggested words: auto_spell_check.py
First you need the greek dictionary: https://ftp.gnu.org/gnu/aspell/dict/el/

Aspell suggest memomoization of function calls to aspell suggest..
https://stackoverflow.com/questions/1988804/what-is-memoization-and-how-can-i-use-it-in-python
https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize

About


Languages

Language:Python 100.0%