AdityaDutt / SpeechEmotionRecognitionPapers

A curated list on the literature of emotion recognition using deep learning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


This repository contains a collection of papers related to Emotion Recognition. A report of all audio datasets is present here.


I gathered these resources for my Ph.D. literature review. In this curated list of papers, I have focused on speech emotion recognition, multi-modal emotion recognition (text + face + speech), and ethics in emotion recognition. I am starting the review based on these papers, but I will keep expanding the list. Feel free to contact me if I have missed some papers.

My personal interests in SER include natural speech generation conditioned on emotion. The state-of-the-art text-to-speech (TTS) deep learning models like Tacotron and Wavenet generate natural speech by conditioning speech on spectrograms/mel-spectrograms of a particular voice. Similarly, they can be conditioned on spectrograms of a particular emotion to synthesize a more lifelike voice. I am also interested in using emotion detection in speech recognition. It can help virtual assistants to understand users better when the meaning of their words is ambiguous. Lastly, it is useful in monitoring people’s emotional states to identify suspicious behavior at public places like airports.

There are a few other applications of Speech Emotion Recognition. It is being used in gaming to improve players’ experience. Gaming companies are substantially investing in Emotion AI using eye-tracking, facial coding, and voice analysis to recognize and interpret human emotions. It can also be used to better help patients with psychological disorders and psychiatric counseling.

Annette Zimmermann, the renowned Gartner analyst, stated - “By 2022, your personal device will know more about your emotional state than your own family.” Emotion AI is rapidly advancing. According to Gartner, ten percent of personal devices worldwide will possess Emotion AI features on-device or via cloud services by 2022.

Speech-Based Emotion Recognition (28 papers)

  1. Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models, 2021 Link

  2. Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings, 2021 Link

  3. Cross corpus multi-lingual speech emotion recognition using ensemble learning, 2021 Link

  4. Contrastive Unsupervised Learning for Speech Emotion Recognition, 2021 Link

  5. Speech SimCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning, 2021. Link

  6. Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention, 2021. Link

  7. Autoencoder With Emotion Embedding for Speech Emotion Recognition, 2021 Link

  8. End-to-end Triplet Loss based Emotion Embedding System for Speech Emotion Recognition, 2020 Link

  9. Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding, 2020 Link

  10. A Siamese Neural Network with Modified Distance Loss For Transfer Learning in Speech Emotion Recognition Kexin Feng, 2020 Link

  11. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers, 2019 Link

  12. An Ensemble Model for Multi-Level Speech Emotion Recognition, 2019 Link

  13. Speech emotion recognition using deep 1D & 2D CNN LSTM networks, 2019 Link

  14. wav2vec: Unsupervised Pre-training for Speech Recognition, 2019 Link

  15. Attention Based Fully Convolutional Network for Speech Emotion Recognition, 2019. Link

  16. Direct Modelling of Speech Emotion from Raw Speech, 2019 Link

  17. Improving Speech Emotion Recognition with Unsupervised Representation Learning on Unlabeled Speech, 2019 Link

  18. Context-Aware Emotion Recognition Networks, 2019 Link

  19. Enhanced speech emotion detection using deep neural networks, 2018 Link

  20. Adversarial Auto-Encoders for Speech Based Emotion Recognition, 2018 Link

  21. Speech Emotion Recognition via Contrastive Loss under Siamese Networks, 2018 Link

  22. Transfer Learning for Improving Speech Emotion Classification Accuracy, 2018. Link

  23. Using regional saliency for speech emotion recognition, 2017 Link

  24. Research on speech emotion recognition based on deep auto-encoder, 2016 Link

  25. Shape-based modeling of the fundamental frequency contour for emotion detection in speech, 2014 Link

  26. Speech emotion recognition approaches in human computer interaction, 2011 Link

  27. Speaker Independent Speech Emotion Recognition by Ensemble Classification., 2005 Link

  28. (Original Methods) Speech emotion recognition using hidden Markov models, 2003 Here

Audio Visual Emotion Recognition (12 papers)

  1. Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning, 2021 Link

  2. An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos, 2020 Link

  3. Metric Learning-Based Multimodal Audio-Visual Emotion Recognition, 2020 Link

  4. Integrating Multimodal Information in Large Pretrained Transformers, 2020 Link

  5. Learning Better Representations for Audio-Visual Emotion Recognition with Common Information, 2020 Link

  6. Audio-visual emotion fusion (AVEF): A deep efficient weighted approach, 2019 Link

  7. Bimodal Emotion Recognition Based on Convolutional Neural Network, 2019 Link

  8. A Multimodal Deep Regression Bayesian Network for Affective Video Content Analyses, 2017 Link

  9. Audio-visual emotion recognition using deep transfer learning and multiple temporal models, 2017 Link

  10. Audio-Visual Emotion Recognition in Video Clips, 2017 Link

  11. Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition, 2017 Link

  12. Predicting Emotions in User-Generated Videos, 2014 Link

Audio Text Emotion Recognition (5 papers)

  1. Multimodal Speech Emotion Recognition Using Cross Attention with Aligned Audio and Text, 2020 Link

  2. Fusion Approaches for Emotion Recognition from Speech Using Acoustic and Text-Based Features, 2020 Link

  3. WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition, 2020 Link

  4. Cooperative Multimodal Approach to Depression Detection in Twitter, 2019 Link

  5. Multimodal Speech Emotion Recognition Using Audio and Text, 2018 Link

Audio, Visual, Text Emotion Recognition (6 papers)

  1. Speech, Voice, Text, And Meaning: A Multidisciplinary Approach to Interview Data through the use of digital tools, 2020 Link

  2. M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues, 2020 Link

  3. Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis, 2020 Link

  4. Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals, 2020 Link

  5. Multi-Modal Speech Emotion Recognition Using Speech Embeddings and Audio Features, 2019 Link

  6. Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis, 2019 Link

Ethical AI in Emotion Recognition (7 papers, 4 articles)

  1. The Ethics of Emotion in Artificial Intelligence Systems, 2021 Link

  2. Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis, 2021 Link

  3. Ethics Sheets for AI Tasks, 2021. Link

  4. The Ethics of AI and Emotional Intelligence, 2020 Link

  5. Practical and Ethical Considerations in the Effective use of Emotion and Sentiment Lexicons, 2020 Link

  6. Developing a Legal Framework for Regulating Emotion AI, 2020 Link

  7. Social and Emotion AI: The Potential for Industry Impact, 2019 Link

  8. (Article By Harvard Business Review) The Risks of Using AI to Interpret Human Emotions, 2019 Link

  9. (Article By MIT Sloan) Emotion AI explained, 2019 Link

  10. China is home to a growing market for dubious “emotion recognition” technology, 2021 Link

  11. 'Every smile you fake' — an AI emotion-recognition system can assess how 'happy' China's workers are in the office, 2021 Link

Review papers (12 papers)

  1. Survey on Machine Learning in Speech Emotion Recognition and Vision Systems Using a Recurrent Neural Network (RNN), 2021 Link

  2. A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism, 2021 Link

  3. A Comprehensive Review of Speech Emotion Recognition Systems, 2021 Link

  4. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review, 2020 Link


  6. A survey of facial expression recognition based on deep learning, 2020. Link

  7. Speech Emotion Recognition Using Deep Learning Techniques: A Review, 2019 Link

  8. Emotion Recognition using Multimodal Residual LSTM Network, 2019 Link

  9. Deep Learning Techniques for Speech Emotion Recognition: A Review, 2019 Link

  10. Emotion detection from text and speech: a survey, 2018 Link

  11. A review of affective computing: From unimodal analysis to multimodal fusion, 2017 Link


If you have any suggested papers, please contact me- aditya.dutt {at}


A curated list on the literature of emotion recognition using deep learning.