training step
yiwei0730 opened this issue · comments
Hello, this is a very good project
I would like to ask if you can write a training step for use. At present, I also want to try to use it in multiple languages (mandarin and english as the main axis)
I'm gonna write steps for train, and inference after training this weekend.
but briefly, you can install dependencies with this
conda create -n pflow-encodec -y python=3.10
conda activate pflow-encodec
conda install -y pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y -c conda-forge libsndfile==1.0.31
pip install -r requirements.txt
and prepare your dataset tsv files like below, three columns are required (audio_path, duration, text)
audio_path duration text
/path/to/audio1 duration_of_audio1 text1
/path/to/audio2 duration_of_audio2 text2
...
then run scripts/dump_latents.py, scripts/dump_durations.py. this script will dump out encodec latent and character duration.
after run dump_latents, global mean and std will be printed out. you should use this value at config like here
then configure your experiment in configs/experiment folder. config is based on hydra.
you can run your experiment python pflow_encodec/train.py experiment=<experiment name>
Thank you for your Thank you for your reply and I will look forward to your more detailed introduction after the weekend.
I would like to ask, decode with MultiBand-Diffusion model (can you tell me where the code is? I can’t seem to find it), and whether the encodec can be tried with a better encodec, such as the latest FAcodec.
@yiwei0730 you can find generation code in https://github.com/seastar105/pflow-encodec/blob/main/notebooks/generate.ipynb
of course it can be used with any continuous representation (mel, dac, other vae).
I've updated README.