The family of UniSpeech:
UniSpeech (
ICML 2021
): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR
UniSpeech-SAT (
ICASSP 2022 Submission
): Universal Speech Representation Learning with Speaker Aware Pre-Training
- [Model Release] Octorber 13, 2021: UniSpeech-SAT models are releaseed.
- [HuggingFace Integration] Octorber 11, 2021: UniSpeech models are on HuggingFace .
- [Model Release] June, 2021: UniSpeech v1 models are released.
We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.
We also evaluate our models on typical speaker related benchmarks.
Model | Fix pre-train | Vox1-O | Vox1-E | Vox1-H |
---|---|---|---|---|
ECAPA-TDNN | - | 0.87 | 1.12 | 2.12 |
HuBERT large | Yes | 0.888 | 0.912 | 1.853 |
Wav2Vec2.0 (XLSR) | Yes | 0.915 | 0.945 | 1.895 |
UniSpeech-SAT large | Yes | 0.771 | 0.781 | 1.669 |
HuBERT large | No | 0.585 | 0.654 | 1.342 |
Wav2Vec2.0 (XLSR) | No | 0.564 | 0.605 | 1.23 |
UniSpeech-SAT large | No | 0.564 | 0.561 | 1.23 |
Regarding reproduction, please contact Zhengyang
Evaluation on LibriCSS
Model | 0S | 0L | OV10 | OV20 | OV30 | OV40 |
---|---|---|---|---|---|---|
Conformer (SOTA) | 4.5 | 4.4 | 6.2 | 8.5 | 11 | 12.6 |
UniSpeech-SAT base | 4.4 | 4.4 | 5.4 | 7.2 | 9.2 | 10.5 |
UniSpeech-SAT large | 4.3 | 4.2 | 5.0 | 6.3 | 8.2 | 8.8 |
paper will appear soon
Regarding reproduction, please contact Sanyuan
Evaluation on CALLHOME
Model | spk_2 | spk_3 | spk_4 | spk_5 | spk_6 | spk_all |
---|---|---|---|---|---|---|
EEND-vector clustering | 7.96 | 11.93 | 16.38 | 21.21 | 23.1 | 12.49 |
EEND-EDA clustering (SOTA) | 7.11 | 11.88 | 14.37 | 25.95 | 21.95 | 11.84 |
UniSpeech-SAT large | 5.93 | 10.66 | 12.9 | 16.48 | 23.25 | 10.92 |
paper will appear soon
Regarding reproduction, please contact Zhengyang
This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.
Microsoft Open Source Code of Conduct
If you find Our work is useful in your research, please cite the following paper:
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
For help or issues using UniSpeech models, please submit a GitHub issue.
For other communications related to UniSpeech, please contact Yu Wu (yuwu1@microsoft.com
).