Deutsches Forschungszentrum für Künstliche Intelligenz German Research Center for Artificial Intelligence Institut für Softwaretechnik und Theoretische Informatik https://github.com/MohamedMesto/MohamedMesto/blob/main/Images/QU-Lab.png Quality and Usability Lab
	Thesis Topic
	"Accented Speech Recognition"
	Supervisors
	Prof. Dr. Sebastian Möller Dr. Tim Polzehl

MasterThesis: Accented Speech Recognition

Abstract

In this study, we conduct a comprehensive analysis of how accent information influences the internal representation of speech in an end-to-end automatic speech recognition (ASR) system. Our approach involves utilizing the state-of-the-art Conformer-Transducer-Large model as the basis for our ASR system. This model architecture combines convolutional neural networks (CNNs) with transformers, enabling effective capturing of both local and global dependencies within the input audio data.

To train the model, we initialize it with a large amount of US-accented English speech data and subsequently fine-tune it on a vast quantity of DE-accented German speech data. We evaluate the performance of the model on speech samples representing eleven distinct German accents. To investigate the impact of accents on the internal representation, we employ two primary probing techniques: a) Gradient-based explanation methods and b) Analysis of the outputs from accent and phone classifiers.

Our findings reveal consistent trends across different accents, irrespective of the probing technique employed. Moreover, we observe that the initial convolutional layer encodes the majority of accent-related information. This observation suggests possibilities for adapting the end-to-end model to learn representations that are invariant to accents.

Overall, our study offers a detailed examination of how accents are manifested in the internal representation of speech within an end-to-end ASR system.

keywords

Accented speech recognition, accent recognition, acoustic modeling, end-to-end ASR

Contributors

Mohamed Mesto m.mesto@campus.tu-berlin.de , Mohamedmesto111@gmail.com

License & copyright

About

MIT License

Languages

Language:Jupyter Notebook 96.0%Language:TeX 3.8%Language:Python 0.2%

Deutsches Forschungszentrum für Künstliche Intelligenz German Research Center for Artificial IntelligenceInstitut für Softwaretechnik und Theoretische Informatikhttps://github.com/MohamedMesto/MohamedMesto/blob/main/Images/QU-Lab.png Quality and Usability Lab

Thesis Topic

"Accented Speech Recognition"