Table 3: Audio-Visual Speech Recognition and 1:25000 audio-video retrieval results with different supervisions.

Question

Table 3: Audio-Visual Speech Recognition and 1:25000 audio-video retrieval results with different supervisions.

zzzzhuque opened this issue 6 years ago · comments

Hi, after reading the paper, I am confused about the table 3.
What is the meaning of visual acc, audio acc and combine acc?
How did you calculate the result of 67.5%, 91.8%, 95.2%?

Hang_Zhou · Answer 1 · Mon Jun 03 2019 16:39:43 GMT+0800 (China Standard Time)

HI @ZHUTAO142857 , sorry that I didn't notice this issue before.

I performed the audio-visual recognition task (word classification for LRW) as written in the paper and these are the accuracies of the classification using only video or audio or combination.