Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to reproduce the score claimed in paper

psui3905 opened this issue · comments

Hi, thanks for the great work.
I'm able to reproduce the score claimed in the paper using our pre-trained model weights. However, when I tried to train Lip2wav on the chem speaker without the weights. the score seems not very good. Here is the result I got:

# Speaker: chem
# Lip2wav - using pre-trained weights 
Mean PESQ: 1.2984
Mean STOI: 0.4285
Mean ESTOI: 0.3204

# Lip2wav - checkpoint on steps 230k 
Mean PESQ: 1.1618
Mean STOI: 0.3245
Mean ESTOI: 0.1539

Tensorboard:
Screen Shot 2021-09-06 at 8 39 30 pm

Both scores get under the same training environment (ffmpeg version 2.8.17) and the same codebase. What could be the potential issue of this?