Child-ASR-Paper
A list of papers for children's automatic speech recognition.
Contents
Data Augmentation
- DATA AUGMENTATION FOR CHILDREN’S SPEECH RECOGNITION THE “ETHIOPIAN” SYSTEM FOR THE SLT 2021 CHILDREN SPEECH RECOGNITION CHALLENGE, G. Chen, X. Na, Y. Wang, Z. Yan et. al. (SLT 2021).
- Data augmentation using prosody and false starts to recognize non-native children’s speech, H. Kathania, M. Singh, T. Gr ´ osz, M. Kurimo, (Interspeech 2020).
- Voice Conversion Based Data Augmentation to Improve Children’s Speech Recognition in Limited Data Scenario, S. Shahnawazuddin, N. Adiga, K. Kumar, A. Poddar and W. Ahmad, (Interspeech 2020).
- DATA AUGMENTATION BASED ON VOWEL STRETCH FOR IMPROVING CHILDREN’S SPEECH RECOGNITION, T. Nagano, T. Fukuda, M. Suzuki, G. Kurata, (ASRU 2019).
- GANS FOR CHILDREN: A GENERATIVE DATA AUGMENTATION STRATEGY FOR CHILDREN SPEECH RECOGNITION, P. Sheng, Z. Yang, Y. Qian, (ASRU 2019).
- Improving Children’s Speech Recognition through Out-of-Domain Data Augmentation, J. Fainberg, P. Bell, M. Lincoln, S. Renals, (Interspeech 2016).
Related papers
- Audio Augmentation for Speech Recognition, T. Ko, V. Peddinti, D. Povey, S. Khudanpur, (Interspeech 2015).
- Data Augmentation for Deep Neural Network Acoustic Modeling, X. Cui, V. Goel, B. Kingsbury, (TASLP 2015).
- Data augmentation for low resource languages, A. Ragni, K. M. Knill, S. P. Rath and M. J. F. Gales, (Interspeech 2014).
- Elastic Spectral Distortion for Low Resource Speech Recognition with Deep Neural Networks, N. Kanda, R. Takeda and Y. Obuchi, (ASRU, 2013).
- Vocal Tract Length Perturbation (VTLP) improves speech recognition, N. Jaitly, G. E. Hinton, (ICML 2013).
Learning from Raw Waveforms
- IMPROVING CHILDREN SPEECH RECOGNITION THROUGH FEATURE LEARNING FROM RAW SPEECH SIGNAL, S. Pavankumar Dubagunta, Selen Hande Kabil and Mathew Magimai.-Doss, (ICASSP 2019).
- ACOUSTIC MODEL ADAPTATION FROM RAW WAVEFORMS WITH SINCNET, Joachim Fainberg, Ondrej Klejch, Erfan Loweimi, Peter Bell, Steve Renals, (ASRU, 2019).
Related papers
- Understanding and Visualizing Raw Waveform-based CNNs, Hannah Muckenhirn, Vinayak Abrol, Mathew Magimai.-Dos, Sebastien Marcel, (Interspeech 2019).
- End-to-End Acoustic Modeling using Convolutional Neural Networks for HMM-based Automatic Speech Recognition, Dimitri Palaza, Mathew Magimai-Dossb, Ronan Collobertd, (Speech Communication, 2019).
- Filter sampling and combination CNN (FSC-CNN): a compact CNN model for small-footprint ASR acoustic modeling using raw waveforms, Jinxi Guo, Ning Xu, Xin Chen, Yang Shi, Kaiyuan Xu, Abeer Alwan, (Interspeech 2018).
- A deep neural network integrated with filterbank learning for speech recognition, Hiroshi Seki, Kazumasa Yamamoto, and Seiichi Nakagawa, (ICASSP, 2017).
Spectral Normalization
- Significance of Pitch-Based Spectral Normalization for Children’s Speech Recognition, Ishwar Chandra Yadav and Gayadhar Pradhan, (SPL 2019).