anuvaad-ocr-model

Open source OCR models for Indic Languages

This repository contains ocr model links for popular Indian languages developed as part of the Anuvaad project.

Please reach out to nlp-nmt@tarento.com for any clarification/interpretation/usage of the linked datasets.

This work is licensed under

Tesseract Language Models

Below models are trained using Tesseract-OCR.

Language	Model
Hindi	anuvad_hin_scene_text_real.traineddata
Tamil	anuvad_tam_scene_text_real.traineddata
Scene-Text Judgement Lline Detection V1	scene_text_judgement_line_detection_v1_model.pth

Below layout models are trained using Layout Parser(Detectron2).

Language	Model
Anuvaad Judgement Line Detection	anuvaad_line_v1.pth
Anuvaad Scene-Text Line Detection	scene_text_judgement_line_detection_v1_model.pth
Anuvaad Judgement Layout	model_final.pth
Anuvaad Table Layout	judgement_prima_table_layout_modelv3.pth

Open source OCR models for Indic Languages

MIT License