Results differ slightly, VCTK dev ASR WER at 99%

Question

Results differ slightly, VCTK dev ASR WER at 99%

dspavankumar opened this issue 5 years ago · comments

Hello,

When I run the recipe I get slightly different results, the most notable one is the VCTK dev ASR WER at 99%. I've attached my results, could you please have a look at them?
results_baseline.txt

Natalia Tomashenko · Answer 1 · Sat Mar 21 2020 05:31:44 GMT+0800 (China Standard Time)

Dear Pavankumar,

Thank you very much for reporting the issue.

The results are not fully deterministic due to some random factors in the anonymization process and signal processing. Hence, slight difference in the results from different runs is possible, and we plan to estimate and report the corresponding value intervals.
Regarding these results:

ASR-vctk_dev_asr
%WER 99.36 [ 86100 / 86658, 581 ins, 3081 del, 82438 sub ] exp/models/asr_eval/decode_vctk_dev_asr_tgsmall/wer_17_1.0
%WER 99.49 [ 86212 / 86658, 710 ins, 1899 del, 83603 sub ] exp/models/asr_eval/decode_vctk_dev_asr_tglarge/wer_17_1.0

ASR-vctk_dev_asr_anon
%WER 99.58 [ 86295 / 86658, 737 ins, 3793 del, 81765 sub ] exp/models/asr_eval/decode_vctk_dev_asr_anon_tgsmall/wer_17_1.0
%WER 99.82 [ 86502 / 86658, 966 ins, 2456 del, 83080 sub ] exp/models/asr_eval/decode_vctk_dev_asr_anon_tglarge/wer_17_1.0

They are not normal and indicate a bug. ASR-vctk_dev_asr corresponds to the ASR results for the original (non-anonymized) data.

Could you please send us:

directory with all content (as an archive file): baseline/exp/models/asr_eval/decode_vctk_dev_asr_tgsmall/log/
file: /baseline/exp/models/asr_eval/decode_vctk_dev_asr_tgsmall/scoring/test_filt.txt

Could you please also keep the content of the following directories:

data/vctk_dev_asr_hires
exp/models/asr_eval/decode_vctk_dev_asr_tgsmall
data/vctk_dev_asr

Thank you.

D S Pavan Kumar · Answer 2 · Mon Mar 23 2020 23:14:34 GMT+0800 (China Standard Time)

Thank you, Natalia, for your response. Attached are the logs (I have omitted the feature files and the decoding lattices to keep it small). It looks more like a language model issue than an acoustic model one, because the decoded sentences are acoustically very close to their corresponding references.
directories.tar.gz

Natalia Tomashenko · Answer 3 · Tue Mar 24 2020 01:43:18 GMT+0800 (China Standard Time)

Thank you, Pavankumar.

The problem is in the text files: in the directories vctk_dev_asr and vctk_dev_asr_hires that you sent (and also in decode_vctk_dev_asr_tgsmall/test_filt.txt), the text is lowcase, but it should be in uppercase.

In the archive file vctk_dev.tar.gz, which is downloaded from the challenge server in run.sh, stage 0, the text files for vctk_dev and vctk_test datasets are in low register (as they are in the original VCTK corpus), but in the same stage, all the text files are "normalized" for speech recognition assessment (in the script: https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2020/blob/master/baseline/local/download_data.sh): conversion letters to uppercase and removing punctuation.

So, normally, if you run the run.sh script from the very beginning without modification, all the directories in the data/ should contain "normalized" text files (no punctuation and all letters are in uppercase).

D S Pavan Kumar · Answer 4 · Tue Mar 24 2020 15:32:16 GMT+0800 (China Standard Time)

Thank you very much, Natalia. The stage 0 must have run incompletely for VCTK dev; we had the text file downloaded but unnormalised. After normalisation I can see comparable results. Thanks a lot again, and best regards!
Pavan.