Phone Classification using Wav2Vec2

This repository contains Speechbrain recipes to fine-tune Wav2Vec2 models on a phone classification task. Following factors were analysed:

Fine-tuning Wav2Vec2,
Pre-training datasets,
Model size,
fine-tuning datasets.

Results of this work have been published at the Interspeech 2024 conference.

Code

The recipes folder contains all Speechbrain recipes.
Results obtained are available in the confusion-matrix/ folder.

Data

For confidentiality reasons, datasets are not included. This work relies on the C2SI, CommonPhone and BREF corpora.

How to cite

If you use this work, please cite as:

@inproceedings{maisonneuve24,
  author    = {Malo Maisonneuve and Corinne Fredouille and Muriel Lalain and Alain Ghio and Virginie Woisard},
  title     = {{Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models}},
  year      = 2024,
  booktitle = {Proc. Interspeech 2024}
}

About

MIT License

Languages

Language:Jupyter Notebook 57.9%Language:Python 42.0%Language:Shell 0.0%