renard314 / textfairy

Android OCR App

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adding Santali OCR

Prasanta-Hembram opened this issue · comments

I have recently used tesseract for sat.traineddata and i think it may work with tesseract 3.0, can you add it in textfairy.

File:-https://github.com/indic-ocr/tessdata/tree/master/sat

Thanks, I will update the language list in my next or next next update!

@renard314 , with some effort i have tried to create santali tessdata https://github.com/Prasanta-Hembram/Santali-tessdata please have a look.

Will it work with Tess4? Have you tried Devanagari script from https://github.com/tesseract-ocr/tessdata_fast/blob/master/script/Devanagari.traineddata? to see if it performs any better?

Actually, Santali is written in Ol Chiki infact in Santali wikipedia we use Ol Chiki instead of Devnagri. Then also i tried Devnagri but it does'nt give Ol Chiki. The mine link i sent you was only supporting 3.0. I have tried it in gImageReader3rd party and got excellent result with this tessdata link, it is working in 4.0 also. I think you can use link.

@renard314 Result after using this. It is @rkvsraman tess data, which is working perfectly fine in detecting texts from image for Santali.
344444Capture copy

My test image below -
Untitled

hey, it would be silly to do but i have tried it in your app by replacing the sat.tessdata data file with eng.tessdata and renaming sat.tessdata. It is working nice, infact it is working fantastic. I have created a video in my YouTube channel explaing how to do it :-) https://youtu.be/smY5NW7I_FQ , if you want i can remove that video at your wish ;-).

Thanks @renard314 for adding Santali Language.