mpc001 / auto_avsr

Auto-AVSR: Lip-Reading Sentences Project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A potential bug

tomato18463 opened this issue · comments

Hi

I used part of your code in my work, and I find a potential bug (I have not run your original code though). Please can you give it a check? Specifically, this line pads the audio data if its length is smaller than 640 times the corresponding video data length. And this line says the variable data has a size of Tx1, so the torch.nn.functional.pad function in this line will result a output size of Tx(1+padding). This seems incorrect to me. I think the padding result is supposed to be (T+padding)x1, and this line may need to be changed to something like torch.nn.functional.pad(data, (0, 0, 0, padding), "constant"). I know I may be wrong as I have not run your original code. Please can you check it anyway?

Thanks!

Hi @tomato18463, I have fixed the bug. Thank you for pointing out this! #20