In IEMOCAP dataset, the two most important emotional cues are speech and face. In this repo, the emotion recognition task using the state-of-the-art ML technique are described.
Most of the data cleaning and preprocessing are similar to my another work in here and here. Therefore, the detailed description is not provided. However, a simple discussion is as follows:
Step 1: extract the speech features using this code. Deal with the missing values of facial/video features using the preprocessing code.
Speech features: pitch, energy, MFCC-12 coefficients, MFB-27 coefficients
Facial features: 46 facial region landmark features on following locations of the face:
chin, forehead, cheek, upper eyebrow, eyebrow, mouth
Step 2: Create statistical features from the framewise-extracted speech and facial features. Statistical features are:
1. Mean, Standard Deviation, 1st and 3rd Quantile, Interquantile range of Pitch, 12 MFCCC coeffs, 27 MFB coeffs, Energy values.
2. Mean, Standard Deviation, 1st and 3rd Quantile, Interquantile range of facial landmark features.
There will be a total of 895 feature and for the emotion recognition task, we choose four emotions namely Anger, Happy, Neutral and Sad. The features can be created by setting window_type = 'static'
in the process_data
function of window_based_reformation.py code and by setting step = 0
in creating_dataset
function of Utt_Fore_Data_Prep.py code.
The processed data can be loaded using the class. It will load the data, labels and speaker ids for 10 speakers.
For running the machine learning models, we use the script . It contains classes for running the ML algorithm for emotion recognition task.
- The class baseline_Gaussian_NB provides the functions for training and testing for Gaussian Naive Bayes algorithm. It will be used as baseline in our project.
- The class ML_SVM provides the functions for training and testing for Support Vector Machine with RBF kernel algorithm. We perform grid search of
C
andgamma
value for all the combination in[0.001, 0.01, 0.1, 1, 10]
. - The class ML_RF provides the functions for training and testing for Random Forest algorithm. We perform the parameter tuning of estimators in 80, 100, and 120.
The shows a detailed performance comparison across different Machine Learning algorithm, different speakers and across different emotion class.