deep-learning deep-neural-networks deeplearning python audio audio-processing mfcc mfcc-extractor mfccs mfcc-features spectrogram cnn cnn-architecture cnn-classification cnn-keras cnn-model keras machine-learning speech-recognition tensoflow

Speech Recognition Using Deep Learning

This was my project for the Machine Learning course, during my Master, and it consisted of using deep learning for speech recognition. More specifically, recognizing which word is being played on an audio track.

I tried the experiment using the two main audio features: spectrograms and MFCCs (Mel Frequency Cepstral Coefficients). To run the implementation, first download the dataset (more instructions in the dataset folder) and run one of the prepare_dataset.py files, depending on which feature you want to use. This python script will create a file called data.json in which there are the features that will be used to train the model. Then run the corresponding train.py file to train the model. When it has finished, the model will be saved (I provide two models already trained, model_spectograms.h5 and model_mfccs.h5). Finally, put in the test folder the tracks you want to make predictions about and run the corresponding predictions.py file and change the path to the track in the main function.

Here I show the loss and accuracy curves I got when I did the project.

Curves using spectrograms

Curves using MFCCs

About

Project I carried out during my Machine Learning course in the Master.

deep-learning deep-neural-networks deeplearning python audio audio-processing mfcc mfcc-extractor mfccs mfcc-features spectrogram cnn cnn-architecture cnn-classification cnn-keras cnn-model keras machine-learning speech-recognition tensoflow

Languages

Language:Python 100.0%