Golang package to download data from www.duyaoss.com, and perform OCR using gosseract and GoCV.
- For
gosseract
to work, Tesseract needs to be installed on your system. It is included in most Linux distributions under the nametesseract
ortesseract-ocr
. You should also install:- Two trained langulage data modules for Tesseract:
tesseract-data-eng
andtesseract-data-chi_sim
. See their official documentation for details. - Library and header files. In Ubuntu it's called
libtesseract-dev
.
- Two trained langulage data modules for Tesseract:
GoCV
is used to preprocess the images for better OCR results. You must also install OpenCV 4.5.0 on your system. The documentation of GoCV goes through the process in great detail. Personally I found it necessary to also install thevtk
andglew
libraries.