project-anuvaad / anuvaad-ocr-model

Open source OCR models for Indic Languages

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

anuvaad-ocr-model

Open source OCR models for Indic Languages

This repository contains ocr model links for popular Indian languages developed as part of the Anuvaad project.

Please reach out to nlp-nmt@tarento.com for any clarification/interpretation/usage of the linked datasets.

This work is licensed under MIT

Tesseract Language Models

Below models are trained using Tesseract-OCR.

Language Model
Hindi anuvaad_hin.traineddata
Bengali anuvaad_ben.traineddata
Kannada anuvaad_kan.traineddata
Malayalam anuvaad_mal.traineddata
Marathi anuvaad_mar.traineddata
Odia anuvaad_ori.traineddata
Tamil anuvaad_tam.traineddata
Telugu anuvaad_tel.traineddata



Tesseract Scene-Text Language Models

Language Model
Hindi anuvad_hin_scene_text_real.traineddata
Tamil anuvad_tam_scene_text_real.traineddata
Scene-Text Judgement Lline Detection V1 scene_text_judgement_line_detection_v1_model.pth


Layout Detection Models

Below layout models are trained using Layout Parser(Detectron2).

Language Model
Anuvaad Judgement Line Detection anuvaad_line_v1.pth
Anuvaad Scene-Text Line Detection scene_text_judgement_line_detection_v1_model.pth
Anuvaad Judgement Layout model_final.pth
Anuvaad Table Layout judgement_prima_table_layout_modelv3.pth



About

Open source OCR models for Indic Languages

License:MIT License