Multimodal speech recognition using lipreading (with CNNs) and audio (using LSTMs). Sensor fusion is done with an attention network.
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool