Automatic Speech Recognizer

This project aims to build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline.

The LibriSpeech dataset is used to train and evaluate the models. The pipeline will first convert any raw audio to feature representations that are commonly used for ASR. It will then move on to building neural networks that can map these audio features to transcribed text. Different audio features taken into consideration are MFCC features and Spectorgrams.
The various models that were implemented include:

Deep RNN + TimeDistributed Dense
CNN + RNN + TimeDistributed Dense
Bidirectional RNN + TimeDistributed Dense
RNN + TimeDistributed Dense
Vanilla RNN
(List is presented in decrasing order of validation accuracy)

Project done as part of Udacity Natural Language Processing Nanodegree Program

Arjunp24 / automatic-speech-recognizer

Automatic Speech Recognizer

About

Languages