machine-learning python3 cnn-keras mel-frequencies melspectrogram

Speech-Sentiment-Analysis-

This project tackles the complex challenge of identifying emotions from voice recordings. Emotions are inherently subjective and typically inferred from visual cues like facial expressions and body language, making voice-based recognition a difficult task. Our goal is to create a model capable of effectively classifying the emotional tone in vocal expressions.

Datasets

Models

A CNN designed to analyze audio files' Mel Spectrograms.
A CNN focusing on Mel Frequency Cepstral Coefficients (MFCCs) of the audio files.
A CRNN that also works with MFCCs.

Project Structure

Gathering Data
Data Organization and Cleaning
Data Exploration, Preparation, and Visualization
Data Preprocessing
Model Implementation

All these components are detailed in the speech_emotion_recognition.ipynb Jupyter notebook.

Insights

The Mel Spectrogram CNN was effective but struggled to differentiate some emotions. The CNN using MFCCs was the most successful, suggesting MFCCs are better for emotion recognition in audio. The CRNN with MFCCs also showed good results but was prone to overfitting and didn't surpass the MFCCs CNN.

Evaluation Metrics

The models were assessed using Precision, Recall, and F1 scores, offering a more nuanced understanding of their effectiveness beyond mere accuracy. The MFCCs CNN model emerged as the top performer, as evidenced by its highest scores in these metrics.

About

Sentiment Analysis from Speech!

machine-learning python3 cnn-keras mel-frequencies melspectrogram

Languages

Language:Jupyter Notebook 100.0%