p0p4k / pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper

Home Page:https://neurips.cc/virtual/2023/poster/69899

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

it's possible voice change to clone new voice with just one wav file or more ?

lpscr opened this issue · comments

hi @p0p4k thank you very much for this amazing work !! ,
i just wonder if possible to use wave file for speaker , so for example i train a speaker and when i have good model
i can easy change with just use wave file some like freevc if you know ,

i see in interface in colab in this part of code

wav_files = glob.glob(path_waves+"*.wav") ## fill in the path to the LJSpeech-1.1 dataset

wav, sr = torchaudio.load(wav_files[0])
from pflow.data.text_mel_datamodule import mel_spectrogram
mel = mel_spectrogram(
            wav,
            1024,
            80,
            22050,
            256,
            1024,
            0,
            8000,
            center=False,
        )

and i wonder if possible , to use other voice , because i try use another wave for voice speak , and the voice same always like i have train the basic speaker it's not change ,

if this possible be great ,

thank you

commented

According to the paper's demo page, it should be possible when we train the base model on many speakers and use a slightly larger model. https://pflow-demo.github.io/projects/pflow/#:~:text=The%20samples%20below,P%2DFlow%20(v1.5)

hi @p0p4k thank you very much for quick reply , i try to train with multi speaker to see how it's going
just to know the dataset i think work like vits2 same format in

wav file|speker_id|text

i run same file like i train the single speaker and i change yaml,file to vctk.yaml and change inside the path files numbr speaker etc ..

python pflow/train.py experiment=vctk.yaml

this be work ? can i use to train like this ?

when i train complete with multi speaker and i want to run , i change the wav like i say up or need extra code for this to change the voice ?

thank you

commented

Same code mostly. You might have to calculate stats before training for your dataset and replace numbers in the yaml file.

@p0p4k great thank you so much for your help, i love your work ! i try train with multi speaker in next days because right now i train with single speaker and i am 150 k about if possible to use single train model when i finish, and use to fine tune a multi speaker or need start from scratch again ?

here how look like this i train it's ok so far ? single speaker
p1

p2

i notice there is empty space i have mark with orange paint, this ok , like i see in fron page your it's full fit
p3