Indian-Scene-Text-Classification

The Indian scene text classification model is developed as part of the work towards Indian Signboard Translation Project by AI4Bharat. I worked on this project under the mentorship of Mitesh Khapra and Pratyush Kumar from IIT Madras.

Indian Signboard Translation involves 4 modular tasks:

T1: Detection: Detecting bounding boxes containing text in the images
T2: Classification: Classifying the language of the text in the bounding box identifed by T1
T3: Recognition: Getting the text from the detected crop by T1 using the T2 classified recognition model
T4: Translation: Translating text from T3 from one Indian language to other Indian language

Note: T2: Classification is not updated in the above picture

Dataset

Indian Scene Text Classification Dataset is used to train this model (D2 + D2-English)

Model

A modifed version of Convolutional Recurrent Neural Network Model (CRNN) is used to architect the classification model. The model uses resnet-18 as the feature extractor of images (initialised with pretrained weights on ImageNet). Then the bidirectional gated recurrent units are used to learn from the spatially sequential output of the former CNN part. Finally, a linear output layer is used to classify the language taking flattened input from the sequential features output of the RNN part.

Input Image Shape: [200, 50]
CNN Output Shape: [13, 256]
RNN Output Shape: [13, 16]
Linear Output Shape: [5]

Training

The classification model is trained for 30 epochs with the following hyperpararmeters. The model weights are saved every 3 epochs and you can find them in the Models directory

train_batch_size = 64
lr = 0.0001
weight_decay = 0.01
lr_step_size = 5
lr_gamma = 0.9

For detailed model architecture and its parameters, check the Define model section of the notebook 1-Language-Classification.ipynb

Performance

The lowest validation loss is observed in epoch 12. Hence, the model Models/Language-Classifier-e12.pth is used to evaluate the classification performance.

Train Accuracy	Val Accuracy	Test Accuracy
0.986	0.943	0.956

Check for the language confusion matrix of the testset below:

As there are high similarities among the characters of Tamil & Malayalam and Hindi & Punjabi over other language pairs, there are many misclassfications among these pairs.

Misclassification Samples in Testset:

Code

Language-Classification.ipynb

References:

About

Indian Scene Text Classification

MIT License

Languages

Language:Jupyter Notebook 100.0%

gokulkarthik / Indian-Scene-Text-Classification