seastar105 / pflow-encodec

Implementation of TTS model based on NVIDIA P-Flow TTS Paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

training step

yiwei0730 opened this issue · comments

Hello, this is a very good project
I would like to ask if you can write a training step for use. At present, I also want to try to use it in multiple languages ​​​​(mandarin and english as the main axis)

I'm gonna write steps for train, and inference after training this weekend.

but briefly, you can install dependencies with this

conda create -n pflow-encodec -y python=3.10
conda activate pflow-encodec
conda install -y pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y -c conda-forge libsndfile==1.0.31
pip install -r requirements.txt

and prepare your dataset tsv files like below, three columns are required (audio_path, duration, text)

audio_path duration text
/path/to/audio1 duration_of_audio1 text1
/path/to/audio2 duration_of_audio2 text2
...

then run scripts/dump_latents.py, scripts/dump_durations.py. this script will dump out encodec latent and character duration.

after run dump_latents, global mean and std will be printed out. you should use this value at config like here

then configure your experiment in configs/experiment folder. config is based on hydra.

you can run your experiment python pflow_encodec/train.py experiment=<experiment name>

Thank you for your Thank you for your reply and I will look forward to your more detailed introduction after the weekend.
I would like to ask, decode with MultiBand-Diffusion model (can you tell me where the code is? I can’t seem to find it), and whether the encodec can be tried with a better encodec, such as the latest FAcodec.

@yiwei0730 you can find generation code in https://github.com/seastar105/pflow-encodec/blob/main/notebooks/generate.ipynb

of course it can be used with any continuous representation (mel, dac, other vae).

I've updated README.