Gautam-J / Sound-Classification-for-deaf-people

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sound Classification for the deaf

Jahnavi Darbhamulla


UI

Dataset

ESC-50 Dataset for Environmental Sound Classification is a tagged collection of 2000 recordings of environmental sound that is appropriate for benchmarking environmental sound categorization techniques.

It contains 50 semantic classes with 40 examples each and 5 major categories:

  • Animals
  • Natural soundscapes & water sounds
  • Human, non-speech sounds
  • Interior/domestic sounds
  • Exterior/urban noises

This dataset can be downloaded as a .zip file: ESC-50 dataset

Methodology

Feature Extraction - MFCC

To perform Audio classification, we first preprocess the data to extract the audio signal's relevant features using MFCC and then pass those important features through the deep neural network for the audio classification. The Mel Frequency Cepstral Coefficients (MFCCs) are short term spectral features of a signal which concisely describe the overall shape of a spectral envelope. Few MFCCs extracted from ESC-50 dataset:

Ariplane:

Dog:

Convolutional Neural Networks

CNNs or convolutional neural nets are a type of deep learning algorithm that does really well at learning images. To use them for Audio classification we extract features which look like images and shape them in a way in order to feed them into a CNN. We use the librosa package to do the same.

Output

Recurrent Neural Networks

Recurrent Neural nets are a type of deep learning algorithm that can remember sequences. Audio data tends to follow a pattern which can be exploited using RNNs to classify them. In contrast to the CNN model's results we decide to use a stateful LSTM thats allows us to simplify the overall network structure. All we need here is the LSTM layer followed by a Dense layer.

Output

Contributors

License

License

Made with ☕ and ❤️

About

License:MIT License


Languages

Language:Jupyter Notebook 100.0%