There are 2 repositories under audio-visual-speech-recognition topic.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Human Emotion Understanding using multimodal dataset.
Transformer-based online speech recognition system with TensorFlow 2
End to End Multiview Lip Reading
Kaldi-based audio-visual speech recognition
🤖 📼 Command-line tool for remixing videos with time-coded transcriptions.
Code related to the fMRI experiment on the contextual modulation of the McGurk Effect
In this repository, I try to use k2, icefall and Lhotse for lip reading. I will modify it for the lip reading task. Many different lip-reading datasets should be added. -_-
Human Emotion Understanding using multimodal dataset