zelaki/DisfluentFA

disfluency-detection forced-alignment interspeech2023

Forced Alignment for Disfluent Speech

Code for: Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling

Presented at ISCA Interspeech 2023.

Project Status

This repository is currently under development and is a work in progress. Contributions and feedback are welcome!

Intro

The study of speech disorders can benefit greatly from time-aligned data. However, audio-text mismatches in disfluent speech cause rapid performance degradation for modern speech aligners, hindering the use of automatic approaches. In this work, we propose a simple and effective modification of align- ment graph construction of CTC-based models using Weighted Finite State Transducers. The proposed weakly-supervised ap- proach alleviates the need for verbatim transcription of speech disfluencies for forced alignment. During the graph construc- tion, we allow the modeling of common speech disfluencies, i.e. repetitions and omissions.

Usage

Wealky-Supervised Forced Alignment

optional arguments:
  -a AUDIO, --audio AUDIO 	Path to the audio file.
  -t TEXT, --text TEXT  	Path to the text file.
  -w, --write_textgrid  	Specify whether to write a TextGrid file.

About

A Weakly Supervised Forced Alignment for disluent speech

disfluency-detection forced-alignment interspeech2023

Languages

Language:Python 98.5%Language:Shell 1.5%

zelaki / DisfluentFA