ChillarAnand / likitham

Indic languages computing resources with a focus on Telugu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Likitham

This repo contains scripts and datasets for processing Telugu language data.

Scripts

Checkout module docstrings of individual scripts on how to use them.

Models

te.pyrnn.gz - Telugu language model(LSTM + CTC) trained with ocropy

Dataset

Sample training data. You can use scripts to generate customized training data.

Useful links

Telugu fonts

Telugu POS tagger

Isolated Handwritten Telugu Character Dataset

Telugu and other south asian language data

Corpus search engine

tessaract-te - Tesseract Open Source OCR Engine

banti_telugu_ocr - End to end OCR system for Telugu. Based on Convolutional Neural Networks.

Chamanti_ocr - Telugu OCR framework using RNN, CTC in Theano & Python3.

http://docs.cltk.org/en/latest/telugu.html

http://www.tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=264&lang=en

http://www.tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1892&lang=en

http://ildc.in/Telugu/htm/lin_ocr_spell.htm

About

Indic languages computing resources with a focus on Telugu


Languages

Language:Python 100.0%