Bag-of-word-SER

Fig 1: End-to-end pipeline for the proposed bag of modulation spectral features extraction and SER. Top part shows signal processing steps involved in BoAW computations.

Step 1: Extract modulation spectral fetaures using window size of 256 ms and frame size of 40 ms (or 64 ms).
Step 2: Extract bag of words on top of these modulation spectrum.
Step 3: These BoW represented feature work as a input to the LSTM model
Step 4: Extract SRMR as a quality feature and these features can be fused to provide robustness along with BOW modulation features.

Fig 2: Average modulation spectrogram plots for unprocessed (top row) and processed speech (bottom row) for high (left column) and low valence (right column) emotional state.

About

Languages

Language:Jupyter Notebook 85.7%Language:Python 11.4%Language:MATLAB 2.9%