Table 3: Audio-Visual Speech Recognition and 1:25000 audio-video retrieval results with different supervisions.
zzzzhuque opened this issue · comments
HI @ZHUTAO142857 , sorry that I didn't notice this issue before.
I performed the audio-visual recognition task (word classification for LRW) as written in the paper and these are the accuracies of the classification using only video or audio or combination.