Sample rate of reference audio for cloning

Question

Sample rate of reference audio for cloning

regstuff opened this issue 6 months ago · comments

Hi,
Tried the colab link to clone a voice with a wav file. Wasnt able to get things to work with 48kHz, 16kHz or 8kHz sample rate files. Any clues as to what the actual format should be?
This is the error I get:

RuntimeError                              Traceback (most recent call last)
[<ipython-input-7-9105142690b8>](https://localhost:8080/#) in <cell line: 9>()
      7 ref_clips = glob.glob(path)
      8 
----> 9 audio,sr = infer_tts(text,ref_clips,diffuser_en,diff_model_en,ts_model_en,vocoder_en)
     10 
     11 write('/content/test.wav',sr,audio)

3 frames
[/usr/local/lib/python3.10/dist-packages/maha_tts/utils/stft.py](https://localhost:8080/#) in transform(self, input_data)
     50 
     51         # similar to librosa, reflect-pad the input
---> 52         input_data = input_data.view(num_batches, 1, num_samples)
     53         input_data = F.pad(
     54             input_data.unsqueeze(1),

RuntimeError: shape '[1, 1, 137686]' is invalid for input of size 275372

Jaskaran Singh · Answer 1 · Sat Jan 20 2024 19:06:21 GMT+0800 (China Standard Time)

sampling rate should be 22050