Results of Funcodec
Slyne opened this issue · comments
Bit rate=8k
Downstream tasks (only 16khz model used)
Stage 1: Run speech emotion recognition.
Acc: 75.21%
Stage 2: Run speaker related evaluation.
Parsing the resyn_trial.txt for resyn wavs
Run speaker verification.
EER: 1.56%
Stage 3: Run automatic speech recognition.
WER: 3.13%
Stage 4: Run audio event classification.
ACC: 83.30%
For reference, DAC 44.1khz for audio_event_classification
got ACC: 90.55%
Objective Results (16khz model for 16khz samples and 48khz model for 48khz samples)
Log results
--------------------------------------------------
File Name: crema_d.log
Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation.
SDR: mean score is: 7.664355354532293
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.9301372
Stage 3: Run STOI.
stoi: mean score is: 0.8652290511677259
Stage 4: Run PESQ.
pesq: mean score is: 1.9714515495300293
--------------------------------------------------
File Name: esc50.log
Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation.
SDR: mean score is: 0.28843353322945814
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5668296
--------------------------------------------------
File Name: fluent_speech_commands.log
Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation.
SDR: mean score is: 8.47528477173951
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.4804714
Stage 3: Run STOI.
stoi: mean score is: 0.9478413458556251
Stage 4: Run PESQ.
pesq: mean score is: 3.0518312084674837
--------------------------------------------------
File Name: fsd50k.log
Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation.
SDR: mean score is: 1.651041018826226
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.9033759
--------------------------------------------------
File Name: gunshot_triangulation.log
Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation.
SDR: mean score is: 6.275478100428441
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.23099
--------------------------------------------------
File Name: libri2Mix_test.log
Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation.
SDR: mean score is: 3.6701485211578273
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5391313
Stage 3: Run STOI.
stoi: mean score is: 0.9362651811605514
Stage 4: Run PESQ.
pesq: mean score is: 2.1895537614822387
--------------------------------------------------
File Name: librispeech.log
Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation.
SDR: mean score is: 8.627505998814492
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5454265
Stage 3: Run STOI.
stoi: mean score is: 0.9568509707064634
Stage 4: Run PESQ.
pesq: mean score is: 3.316485096216202
--------------------------------------------------
File Name: quesst.log
Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation.
SDR: mean score is: 6.899273166546299
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.237886
Stage 3: Run STOI.
stoi: mean score is: 0.9110949624359219
Stage 4: Run PESQ.
pesq: mean score is: 2.5656625175476075
--------------------------------------------------
File Name: snips_test_valid_subset.log
Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation.
SDR: mean score is: 11.001265123350482
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.7819229
Stage 3: Run STOI.
stoi: mean score is: 0.9753332596498754
Stage 4: Run PESQ.
pesq: mean score is: 3.383010833263397
--------------------------------------------------
File Name: vox_lingua_top10.log
Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation.
SDR: mean score is: 8.071351215845228
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.1897244
Stage 3: Run STOI.
stoi: mean score is: 0.9018324319464593
Stage 4: Run PESQ.
pesq: mean score is: 1.928473423719406
--------------------------------------------------
File Name: voxceleb1.log
Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation.
SDR: mean score is: 7.051308404176289
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.8565342
Stage 3: Run STOI.
stoi: mean score is: 0.9340248268933423
Stage 4: Run PESQ.
pesq: mean score is: 3.0424613475799562
--------------------------------------------------
Average SDR for speech datasets: 7.682561569520302
Average Mel_Loss for speech datasets: 1.6951542375
Average STOI for speech datasets: 0.9285590037269955
Average PESQ for speech datasets: 2.68111621722579
Average SDR for audio datasets: 2.7383175508280417
Average Mel_Loss for audio datasets: 1.5670651666666666
Thanks for submitting the results. Could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know how to do inference using your model (we will leverage your model to test on the hidden set)?
bit width=8kbps model trained with 16k samples only
Downstream task
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition.
Acc: 75.21%
Stage 2: Run speaker related evaluation.
Parsing the resyn_trial.txt for resyn wavs
Run speaker verification.
EER: 1.56%
Stage 3: Run automatic speech recognition.
WER: 3.13%
Stage 4: Run audio event classification.
ACC: 83.30%
Objective result
Log results
--------------------------------------------------
File Name: crema_d.log
Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation.
SDR: mean score is: 2.2232599443745995
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.5125315
Stage 3: Run STOI.
stoi: mean score is: 0.8384541409928323
Stage 4: Run PESQ.
pesq: mean score is: 1.5559590673446655
--------------------------------------------------
File Name: esc50.log
Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation.
SDR: mean score is: -4.602151194644759
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.3825583
--------------------------------------------------
File Name: fluent_speech_commands.log
Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation.
SDR: mean score is: 8.47528477173951
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.4804714
Stage 3: Run STOI.
stoi: mean score is: 0.9478413458556251
Stage 4: Run PESQ.
pesq: mean score is: 3.0518312084674837
--------------------------------------------------
File Name: fsd50k.log
Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation.
SDR: mean score is: -2.0076792522998725
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.457246
--------------------------------------------------
File Name: gunshot_triangulation.log
Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation.
SDR: mean score is: 6.94366284167626
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.6988914
--------------------------------------------------
File Name: libri2Mix_test.log
Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation.
SDR: mean score is: 3.6701485211578273
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5391313
Stage 3: Run STOI.
stoi: mean score is: 0.9362651811605514
Stage 4: Run PESQ.
pesq: mean score is: 2.1895537614822387
--------------------------------------------------
File Name: librispeech.log
Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation.
SDR: mean score is: 8.627505998814492
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5454265
Stage 3: Run STOI.
stoi: mean score is: 0.9568509707064634
Stage 4: Run PESQ.
pesq: mean score is: 3.316485096216202
--------------------------------------------------
File Name: quesst.log
Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation.
SDR: mean score is: 6.899273166546299
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.237886
Stage 3: Run STOI.
stoi: mean score is: 0.9110949624359219
Stage 4: Run PESQ.
pesq: mean score is: 2.5656625175476075
--------------------------------------------------
File Name: snips_test_valid_subset.log
Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation.
SDR: mean score is: 11.001265123350482
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.7819229
Stage 3: Run STOI.
stoi: mean score is: 0.9753332596498754
Stage 4: Run PESQ.
pesq: mean score is: 3.383010833263397
--------------------------------------------------
File Name: vox_lingua_top10.log
Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation.
SDR: mean score is: 6.818639103419933
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.9194256
Stage 3: Run STOI.
stoi: mean score is: 0.9198648881684639
Stage 4: Run PESQ.
pesq: mean score is: 1.922272914648056
--------------------------------------------------
File Name: voxceleb1.log
Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation.
SDR: mean score is: 7.051308404176289
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.8565342
Stage 3: Run STOI.
stoi: mean score is: 0.9340248268933423
Stage 4: Run PESQ.
pesq: mean score is: 3.0424613475799562
--------------------------------------------------
Average SDR for speech datasets: 6.845835629197429
Average Mel_Loss for speech datasets: 1.8591661750000001
Average STOI for speech datasets: 0.9274661969828843
Average PESQ for speech datasets: 2.6284045933187006
Average SDR for audio datasets: 0.11127746491054295
Average Mel_Loss for audio datasets: 2.1795652333333333
If possible, could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know brief descriptions and how to do inference using your model (we will leverage your model to test on the hidden set)?
If possible, could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know brief descriptions and how to do inference using your model (we will leverage your model to test on the hidden set)?
Was uploading models.
Just sent the email. Please check.
Thanks!
Perfect. Thank you.