Classification of Handwritten Character Dataset and Consonant Vowel (CV) segment dataset

Objective

The main aim of this project is to classify Handwritten character dataset which consist of Kannada/Telugu script in coordinate form and to also classify Consonant Vowel (CV) segment dataset, a conversational speech data spoken in Hindi language by using RNN and LSTM.

Handwritten Character Dataset

Data

Five characters are there a, aI, bA, dA and lA, each characters are stored in .txt files as sequence of 2-dimensional points (x and y coordinates) :-

Model

RNN

Accuracy on Train Set: 0.96
Accuracy on Test Set: 0.94

LSTM

Accuracy on Train Set: 0.98
Accuracy on Test Set: 0.97

Confusion Matrix

Consonant Vowel (CV) segment dataset

Data

This dataset consists of subset of CV segments from a conversational speech data spoken in Hindi language. Training and test data are separated and are provided inside the respective CV segment folder where each class consist of 39-dimensional Mel frequency cepstral coefficient (MFCC) features.

Model

RNN

Accuracy on Train Set: 0.988
Accuracy on Test Set: 0.899

LSTM

Accuracy on Train Set: 0.997
Accuracy on Test Set: 0.879

Confusion Matrix

Conclusion

In both of the cases, our data consists of long sequential sequences, the better accuracy of the LSTM model confirms its effectiveness over the standard RNN.

About

Languages

Language:Jupyter Notebook 99.0%Language:Python 1.0%