INTERSPEECH 2019

Spoken Language Processing

Special Session 1-6: Spoken Language Processing for Children’s Speech

Fei Wu, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur. Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network [INTERSPEECH 2019]
- ASR TDNN-F Data augmentation Vocal tract length normalization (VTLN)
- TDNN-F: Subsample + SVD

Oral 1-1: End-to-End Speech Recognition

Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde. Jasper: An End-to-End Convolutional Neural Acoustic Model [INTERSPEECH 2019]
- ASR LibriSpeech WSJ Hub5'00 Conv1D Dense Residual NovoGrad

Poster 1-A: Speaker Recognition and Diarization

Zhifu Gao, Yan Song, Ian McLoughlin, Pengcheng Li, Yiheng Jiang, Li-Rong Dai. Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Veriﬁcation System [INTERSPEECH 2019]
- TI-SV VoxCeleb Multi-stage aggregation (MSA) DALoss
Hitoshi Yamamoto, Kong Aik Lee, Koji Okabe, Takafumi Koshinaka. Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding [INTERSPEECH 2019]
- TI-SV VoxCeleb SITW Data augmentation DNN based bandwidth extension
- After voice conversion, e.g., speed perturbation, utterances should be assigned with new speaker labels
- Use DNN to estimate missing filter banks for low-bandwidth features

Special Session 4-A: The 2019 Automatic Speaker Veriﬁcation Spooﬁng and Countermeasures Challenge: ASVspoof Challenge - P

Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak. ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks [INTERSPEECH 2019]
- TI-SV Anti-spoofing Acoustic features Squeeze-and-excitation Residual
Bhusan Chettri, Daniel Stoller, Veronica Morfi, Marco A. Martínez Ramírez, Emmanouil Benetos, Bob L. Sturm. Ensemble Models for Spoofing Detection in Automatic Speaker Verification [INTERSPEECH 2019]
- TI-SV Anti-spoofing Ensemble
- Ensemble traditional ML models (GMM, SVM) and deep models (CNN, CRNN, Sample-level CNN, Wave-U-Net)

Oral 3-2: Speaker Recognition 1

Gautam Bhattacharya, Jahangir Alam, Patrick Kenny. Deep Speaker Recognition: Modular or Monolithic? [INTERSPEECH 2019]
- TI-SV VoxCeleb AAM-Softmax Neural backend
- VoxCeleb1 EER 0.55%
Shuai Wang, Johan Rohdin, Lukáš Burget, Oldřich Plchot, Yanmin Qian, Kai Yu, Jan Černocký. On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction [INTERSPEECH 2019]
- TI-SV VoxCeleb Multi-task learning Phonetic information
- Encourage phonetic information at the frame-level stage and suppress it at the segment-level stage
Mirco Ravanelli, Yoshua Bengio. Learning Speaker Representations with Mutual Information [INTERSPEECH 2019]
- TI-SV TIMIT LibriSpeech VoxCeleb Unsupervised learning Mutual information
Lanhua You, Wu Guo, Li-Rong Dai, Jun Du. Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification [INTERSPEECH 2019]
- TI-SV NIST SRE VOiCES Multi-task learning
- Use the first- and higher-order statistics as the reconstruction targets
Zhanghao Wu, Shuai Wang, Yanmin Qian, Kai Yu. Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification [INTERSPEECH 2019]
- TI-SV NIST SRE Data augmentation Conditional VAE
- Train CVAE on manually augmented samples, then generate more embeddings for training PLDA
Lanhua You, Wu Guo, Li-Rong Dai, Jun Du. Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification [INTERSPEECH 2019]
- TI-SV NIST SRE Gated CNN Gated-attention statistics pooling

Oral 4-1: Speaker and Language Recognition 1

Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu. RawNet: Advanced End-to-End Deep Neural Network Using Raw Waveforms for Text-Independent Speaker Verification [INTERSPEECH 2019]
Wei Rao, Chenglin Xu, Eng Siong Chng, Haizhou Li. Target Speaker Extraction for Multi-Talker Speaker Verification [INTERSPEECH 2019]

Oral 5-5: Speaker Recognition Evaluation

Special Session 7-3: The VOiCES from a Distance Challenge - O

Special Session 7-A: The VOiCES from a Distance Challenge - P

Oral 8-5: Speaker Recognition 2

Themos Stafylakis, Johan Rohdin, Oldřich Plchot, Petr Mizera, Lukáš Burget. Self-Supervised Speaker Embeddings [INTERSPEECH 2019]
- TI-SV VoxCeleb SITW Self-supervised
- Reconstruct frames using speaker embedding and ASR outputs
Andreas Nautsch, Jose Patino, Amos Treiber, Themos Stafylakis, Petr Mizera, Massimiliano Todisco, Thomas Schneider, Nicholas Evans. Privacy-Preserving Speaker Recognition with Cohort Score Normalisation [INTERSPEECH 2019]
Yi Liu, Liang He, Jia Liu. Large Margin Softmax Loss for Speaker Verification [INTERSPEECH 2019]
- TI-SV VoxCeleb AM-Softmax
- VoxCeleb1 EER 2.00%
- A large weight decay at 0.01
Amirhossein Hajavi, Ali Etemad. A Deep Neural Network for Short-Segment Speaker Recognition [INTERSPEECH 2019]
- TI-SV VoxCeleb Short duration Multi-stage aggregation
- Apply non-linear aggregator over embeddings from different stage
Jianfeng Zhou, Tao Jiang, Zheng Li, Lin Li, Qingyang Hong. Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function [INTERSPEECH 2019]
- TI-SV VoxCeleb Conv1D Squeeze-and-excitation (SE) Additive supervision softmax
- Use statistics pooling to replace global average pooling in squeeze-and-excitation
Suwon Shon, Hao Tang, James Glass. VoiceID Loss: Speech Enhancement for Speaker Verification [INTERSPEECH 2019]
- TI-SV Speech enhancement (SE) VoxCeleb Mask Conv1D

XinMing0411 / INTERSPEECH2019

INTERSPEECH 2019

Spoken Language Processing

Special Session 1-6: Spoken Language Processing for Children’s Speech

Oral 1-1: End-to-End Speech Recognition

Poster 1-A: Speaker Recognition and Diarization

Special Session 4-A: The 2019 Automatic Speaker Veriﬁcation Spooﬁng and Countermeasures Challenge: ASVspoof Challenge - P

Oral 3-2: Speaker Recognition 1

Oral 4-1: Speaker and Language Recognition 1

Oral 5-5: Speaker Recognition Evaluation

Special Session 7-3: The VOiCES from a Distance Challenge - O

Special Session 7-A: The VOiCES from a Distance Challenge - P

Oral 8-5: Speaker Recognition 2

Poster 6-A: Speaker Recognition and Anti-Spoofing

Poster 9-A: Speaker and Language Recognition 2

Poster 10-A: Speaker Recognition 3

About