mpc001 / Lipreading_using_Temporal_Convolutional_Networks

ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Must convert gray?

TryHard-LL opened this issue · comments

commented

hi,
I have a issue about ' convert-gray' must be 'true' to get a frame * h * w ? Not channel , such as RGB ?
Another query is we must prepocessing the video to .npz files?
Only we do this, we can train the work.
Looking forward your reply.

Thanks for your questions. Using RGB images as input is possible as long as the input channel of the 3D convolutional layer is set to 3.
load_data() method at lipreading/dataset.py supports numpy data only but it is not necessary to save pre-processed videos to ".npz" files. You can customise your own data loader.