training step

Question

training step

yiwei0730 opened this issue 3 months ago · comments

Hello, this is a very good project
I would like to ask if you can write a training step for use. At present, I also want to try to use it in multiple languages (mandarin and english as the main axis)

HAESUNG JEON · Answer 1 · Thu Mar 28 2024 11:12:51 GMT+0800 (China Standard Time)

I'm gonna write steps for train, and inference after training this weekend.

but briefly, you can install dependencies with this

conda create -n pflow-encodec -y python=3.10
conda activate pflow-encodec
conda install -y pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y -c conda-forge libsndfile==1.0.31
pip install -r requirements.txt

and prepare your dataset tsv files like below, three columns are required (audio_path, duration, text)

audio_path duration text
/path/to/audio1 duration_of_audio1 text1
/path/to/audio2 duration_of_audio2 text2
...

then run scripts/dump_latents.py, scripts/dump_durations.py. this script will dump out encodec latent and character duration.

after run dump_latents, global mean and std will be printed out. you should use this value at config like here

then configure your experiment in configs/experiment folder. config is based on hydra.

you can run your experiment python pflow_encodec/train.py experiment=<experiment name>

yiwei0730 · Answer 2 · Thu Mar 28 2024 11:38:40 GMT+0800 (China Standard Time)

Thank you for your Thank you for your reply and I will look forward to your more detailed introduction after the weekend.
I would like to ask, decode with MultiBand-Diffusion model (can you tell me where the code is? I can’t seem to find it), and whether the encodec can be tried with a better encodec, such as the latest FAcodec.

HAESUNG JEON · Answer 3 · Thu Mar 28 2024 11:52:39 GMT+0800 (China Standard Time)

@yiwei0730 you can find generation code in https://github.com/seastar105/pflow-encodec/blob/main/notebooks/generate.ipynb

of course it can be used with any continuous representation (mel, dac, other vae).

HAESUNG JEON · Answer 4 · Sun Mar 31 2024 20:49:08 GMT+0800 (China Standard Time)

I've updated README.