automatic-speech-recognition disfluency disfluency-detection disfluency-detector interspeech interspeech2024 speech-recognition speech-to-text whisperx google-asr

README.txt

Code Design

The code is structured as two pipelines of scripts. The following diagrams capture the dependency structure of the scripts (the following script depends on the output of the previous script):

Modules

Install the listed dependencies for each of these modules -- following the instructions on each of their pages.

evaluate by HuggingFace

Link: https://github.com/huggingface/evaluate Library Version: 0.4.0 Python Version: 3.8

torcheval by PyTorch

Link: https://github.com/pytorch/torcheval Library Version: 0.0.7 Python Version: 3.8

We had to modify this code, so we provide the code here as a subdirectory.

Spotify Podcast Dataset

Link: https://podcastsdataset.byspotify.com/

This dataset is maintained by Spotify, and access to the dataset is determined by Spotify.

Additional Dependencies

Pandas (Link: https://pandas.pydata.org/)
tqdm (Link: https://github.com/tqdm/tqdm)

About

Code for our INTERSPEECH 2024 paper: Comparing ASR Systems in the Context of Speech Disfluencies.

https://www.comparing-asr-systems.com

automatic-speech-recognition disfluency disfluency-detection disfluency-detector interspeech interspeech2024 speech-recognition speech-to-text whisperx google-asr

MIT License

Languages

Language:Jupyter Notebook 92.0%Language:Python 7.5%Language:Cython 0.5%