EmoNet

Audio-only Emotion Detection using Federated Learning

Contributors: Adar Arnon and John Keck

EmoNet is a federated learning system for emotion detection using audio features (MFCCs). The system consists of a server and a client, the server acting as a centralized source-of-truth for the most recently updated model and the client acting as a public-facing webpage for any user to run an inference or submit for model improvement. The system allows for boostrapping an audio-only model with user-provided, self-labeled data.

EmoNet is hosted on Google Cloud Platform and can be accessed at https://emonet.xyz.

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
S. Haq and P.J.B. Jackson, "Multimodal Emotion Recognition", In W. Wang (ed), Machine Audition: Principles, Algorithms and Systems, IGI Global Press, ISBN 978-1615209194, chapter 17, pp. 398-423, 2010.
S. Haq and P.J.B. Jackson. "Speaker-Dependent Audio-Visual Emotion Recognition", In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pages 53-58, 2009.
S. Haq, P.J.B. Jackson, and J.D. Edge. Audio-Visual Feature Selection and Reduction for Emotion Classification. In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pages 185-190, 2008
C. Busso, M. Bulut, C.C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J.N. Chang, S. Lee, and S.S. Narayanan, "IEMOCAP: Interactive emotional dyadic motion capture database," Journal of Language Resources and Evaluation, vol. 42, no. 4, pp. 335-359, December 2008.

Audio-only Emotion Detection using Federated Learning

Language:JavaScript 49.7%Language:Python 38.3%Language:HTML 5.1%Language:CSS 4.7%Language:Dockerfile 2.2%