flask google-speech-to-text heroku librosa mel-spectrogram mfcc microservice speech-emotion-recognition deep-learning

Welcome to mock-buddy-audio-server 👋

This repo contains audio processing service (microservice) for Mock-Buddy application, uses Flask to build WebSocket and RESTful APIs, you can see project description here. This service is deployed on Heroku.

System will analyze speech and generate reports based on how good user's voice throughout the speech and his/her speech rate. This will highly affect the engagement of audience because if the speaker is not confident about their speech then audience engagement rate will decrease over time. Giving speech without fear and engaging speech instead of monotonous speech are keypoints for increasing audience engagement. And also, speaker needs to be aware of his speech rate will doing presentation/speech, because even if we practice carefully, they may speech at faster rate due to joy or slower rate due to fear.

Workflow is,

Detect speech rate
Detect speech confidence

Speech rate

Speech rate calculated by dividing number of words spoken by time taken. Recorded speech was transcribed using Google Speech-to-Text API and spoken time was calculated accurately with help of VAD.
Speech confidence

Speech confidence score is calculated using speech emotion classifier's output. CNN architecture used to built speech emotion classifier (aka recognition). Model trained on RAVDESS, SAVEES, TESS datasets. Training details are in this repo.

Prerequisite

FFmpeg, portaudio19-dev (For audio processing)
Google Cloud account
Set env variables (GOOGLE_APPLICATION_CREDENTIALS - path to google_credentials.json, GOOGLE_CREDENTIALS - content of file)
Python 3.7 or newer

Install

pip install -r requirements.txt

Usage

python3 app.py

Author

👤 Karthick T. Sharma

Github: @Karthick47v2
LinkedIn: @Karthick47

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐️ if this project helped you!

About

audio processing service for mock-buddy

flask google-speech-to-text heroku librosa mel-spectrogram mfcc microservice speech-emotion-recognition deep-learning

MIT License

Languages

Language:PureBasic 78.0%Language:Python 21.8%Language:Procfile 0.1%Language:Shell 0.1%