oplatek / moosenet-plda

Official repo for “MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module”

Home Page:https://arxiv.org/abs/2301.07087

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MooseNet PLDA

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module, on Arxiv.
Accepted to Speech Synthesis Workshop 12, 2023, Grenoble
Presentation slides

 

Moosenet PLDA

MooseNet is a trainable metric for synthesized speech. We experimented with SSL NN models and PLDA module. See the MooseNet-PLDA paper.

Installation

# Optional for reinstallation
conda deactivate; rm -rf env; 
# Installing new conda environment and editable pip moosenet package
conda env create --prefix ./env -f environment.yml \
  && conda activate ./env \
  && pip install -e .[dev] 

Reproducing the Experiments

  • The commands for fine-tuning a SSL models (XLS-R and Wav2Vec 2.0) to MooseNet NN on the English data from the main track can be found in ./main.sh
  • For the commands for fine-tuning MooseNet NN on main and the Chinese set from OOD track see ./ood.sh

About

Official repo for “MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module”

https://arxiv.org/abs/2301.07087

License:Apache License 2.0


Languages

Language:Python 90.4%Language:Shell 9.6%