mpc001 / Lipreading_using_Temporal_Convolutional_Networks

ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can replace gru with transformer?

hegc opened this issue · comments

commented

Hi, yes, both are suitable for sequence modelling.

commented

I replaced gru with transformer, but the model did not converge...If you have time to try, please share it, thank you.

Hi, when using transformer, I suggest that you could carefully tune the warm-up stage such as peak learning rate and number of steps.