MohamedMesto / MasterThesis-QU-DFKI-Accented-Speech-Recognition-ASR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mohamed Mesto Mohamed Mesto
Thesis Topic
Supervisors
Prof. Dr. Sebastian Möller
Dr. Tim Polzehl

MasterThesis: Accented Speech Recognition

Abstract

In this study, we conduct a comprehensive analysis of how accent information influences the internal representation of speech in an end-to-end automatic speech recognition (ASR) system. Our approach involves utilizing the state-of-the-art Conformer-Transducer-Large model as the basis for our ASR system. This model architecture combines convolutional neural networks (CNNs) with transformers, enabling effective capturing of both local and global dependencies within the input audio data.

To train the model, we initialize it with a large amount of US-accented English speech data and subsequently fine-tune it on a vast quantity of DE-accented German speech data. We evaluate the performance of the model on speech samples representing eleven distinct German accents. To investigate the impact of accents on the internal representation, we employ two primary probing techniques: a) Gradient-based explanation methods and b) Analysis of the outputs from accent and phone classifiers.

Our findings reveal consistent trends across different accents, irrespective of the probing technique employed. Moreover, we observe that the initial convolutional layer encodes the majority of accent-related information. This observation suggests possibilities for adapting the end-to-end model to learn representations that are invariant to accents.

Overall, our study offers a detailed examination of how accents are manifested in the internal representation of speech within an end-to-end ASR system.

keywords

Accented speech recognition, accent recognition, acoustic modeling, end-to-end ASR

Contributors

License & copyright

© Mohamed Mesto License under the [MIT License] (LICENSE).

About

License:MIT License


Languages

Language:Jupyter Notebook 96.0%Language:TeX 3.8%Language:Python 0.2%