The Indian scene text classification model is developed as part of the work towards Indian Signboard Translation Project by AI4Bharat. I worked on this project under the mentorship of Mitesh Khapra and Pratyush Kumar from IIT Madras.
Indian Signboard Translation involves 4 modular tasks:
T1
: Detection: Detecting bounding boxes containing text in the imagesT2
: Classification: Classifying the language of the text in the bounding box identifed byT1
T3
: Recognition: Getting the text from the detected crop byT1
using theT2
classified recognition modelT4
: Translation: Translating text fromT3
from one Indian language to other Indian language
Note:
T2
: Classification is not updated in the above picture
Indian Scene Text Classification Dataset is used to train this model (D2
+ D2-English
)
A modifed version of Convolutional Recurrent Neural Network Model (CRNN) is used to architect the classification model. The model uses resnet-18 as the feature extractor of images (initialised with pretrained weights on ImageNet). Then the bidirectional gated recurrent units are used to learn from the spatially sequential output of the former CNN part. Finally, a linear output layer is used to classify the language taking flattened input from the sequential features output of the RNN part.
- Input Image Shape: [200, 50]
- CNN Output Shape: [13, 256]
- RNN Output Shape: [13, 16]
- Linear Output Shape: [5]
The classification model is trained for 30 epochs with the following hyperpararmeters. The model weights are saved every 3 epochs and you can find them in the Models
directory
- train_batch_size = 64
- lr = 0.0001
- weight_decay = 0.01
- lr_step_size = 5
- lr_gamma = 0.9
For detailed model architecture and its parameters, check the Define model
section of the notebook 1-Language-Classification.ipynb
The lowest validation loss is observed in epoch 12. Hence, the model Models/Language-Classifier-e12.pth
is used to evaluate the classification performance.
Train Accuracy | Val Accuracy | Test Accuracy |
---|---|---|
0.986 | 0.943 | 0.956 |
Check for the language confusion matrix of the testset below:
As there are high similarities among the characters of Tamil & Malayalam
and Hindi & Punjabi
over other language pairs, there are many misclassfications among these pairs.
Misclassification Samples in Testset: