renard314 / textfairy

Android OCR App

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update to Tesseract V4 + add Arabic language

ofirkris opened this issue · comments

101 languages is available in tesseract V4
Languages:
https://github.com/tesseract-ocr/tessdata

BTW:
You've done a great job!

I agree V4 is very promising. Especially since it works much better with complex scripts like Arabic.
But I think its not yet stable enough to be used in production. There are a few open issues that seem critical to me. Also its not yet compiling or running on Android. At the time I was evaluating it it was crashing on my device.
But I will keep an eye on it and will update as soon as there is a stable release that works on mobile.

Thanks!
I'm doing a research on CV and Textfairy is a great resource.

Tess4 compiled perfectly On Ubuntu, too bad - it really looks promising, even for Hindi and other already working langs.

Furthermore - is TextFairy truly Opensource?
I Mean - I couldn't build the SO which does the binarization, it needs PixAdaptiveBinarizer.cpp which aren't shared here.
Please advise.

Try to build latest master with ./gradlew assembleDevelop

PixAdaptiveBinarizer.cpp is an optional dependency. If its not there the app will use a simple tiling binarzizer that is based on statistical histogram analysis. It works decently well on black on white text. I you want to use a different binarization algorithm just edit the binarize function in pixFunc.cpp and add your code there. (you can try pixSauvolaBinarize) But you will find that it requires you to define some params that should be different based in the image.
Thats where PixAdaptiveBinarizer.cpp comes into play.
PixAdaptiveBinarizer.cpp is my newer binarization algorithm that is mostly parameter free and that I spent some months developing and fine tuning. It can handle mixed colors (black on white and white on black in the same document) and its also a bit more robust when dealing with highly varying font sizes. Please understand that I'm not comfortable in open sourcing it