evaluation evaluation-metrics fine-tuning metric metrics self-supervised-learning speech-synthesis text-to-speech voice-conversion

MooseNet PLDA

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module, on Arxiv.
Accepted to Speech Synthesis Workshop 12, 2023, Grenoble
Presentation slides

Moosenet PLDA

MooseNet is a trainable metric for synthesized speech. We experimented with SSL NN models and PLDA module. See the MooseNet-PLDA paper.

Installation

# Optional for reinstallation
conda deactivate; rm -rf env; 
# Installing new conda environment and editable pip moosenet package
conda env create --prefix ./env -f environment.yml \
  && conda activate ./env \
  && pip install -e .[dev]

Reproducing the Experiments

The commands for fine-tuning a SSL models (XLS-R and Wav2Vec 2.0) to MooseNet NN on the English data from the main track can be found in ./main.sh
For the commands for fine-tuning MooseNet NN on main and the Chinese set from OOD track see ./ood.sh

About

Official repo for “MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module”

https://arxiv.org/abs/2301.07087

evaluation evaluation-metrics fine-tuning metric metrics self-supervised-learning speech-synthesis text-to-speech voice-conversion

Apache License 2.0

Languages

Language:Python 90.4%Language:Shell 9.6%