SER
An approach to predict speech emotions from clips of audio using deep learning.
To learn more about the approach: article
Folders
datasets: contains data downloaded from kaggle datasets (data will be downloaded in feature_extraction notebook)
saved_datasets: contains locally saved numpy datasets
models: contains model checkpoints
logs: contains tensorboard logs
Versions
Python version: Python 3.6.9
CUDA Version: 10.1
How to run
- Fix kaggle api or just upload kaggle.json in the main repo
- Run features extraction notebook then system notebook
Other works that inspired me
Audio emotion 5 notebooks: https://www.kaggle.com/ejlok1/audio-emotion-part-1-explore-data by https://www.kaggle.com/ejlok1
Audio data analysis: https://www.kdnuggets.com/2020/02/audio-data-analysis-deep-learning-python-part-1.html
Reference:
ravdess dataset: - https://www.kaggle.com/uwrfkaggler/ravdess-emotional-speech-audio - https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196391
TESS dataset: - https://www.kaggle.com/ejlok1/toronto-emotional-speech-set-tess - https://tspace.library.utoronto.ca/handle/1807/24487
SAVE dataset: - http://kahlan.eps.surrey.ac.uk/savee/ - https://www.kaggle.com/barelydedicated/savee-database
CREMA dataset: - https://github.com/CheyneyComputerScience/CREMA-D - https://www.kaggle.com/ejlok1/cremad