dubverse-ai / MahaTTS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sample rate of reference audio for cloning

regstuff opened this issue · comments

Hi,
Tried the colab link to clone a voice with a wav file. Wasnt able to get things to work with 48kHz, 16kHz or 8kHz sample rate files. Any clues as to what the actual format should be?
This is the error I get:

RuntimeError                              Traceback (most recent call last)
[<ipython-input-7-9105142690b8>](https://localhost:8080/#) in <cell line: 9>()
      7 ref_clips = glob.glob(path)
      8 
----> 9 audio,sr = infer_tts(text,ref_clips,diffuser_en,diff_model_en,ts_model_en,vocoder_en)
     10 
     11 write('/content/test.wav',sr,audio)

3 frames
[/usr/local/lib/python3.10/dist-packages/maha_tts/utils/stft.py](https://localhost:8080/#) in transform(self, input_data)
     50 
     51         # similar to librosa, reflect-pad the input
---> 52         input_data = input_data.view(num_batches, 1, num_samples)
     53         input_data = F.pad(
     54             input_data.unsqueeze(1),

RuntimeError: shape '[1, 1, 137686]' is invalid for input of size 275372

sampling rate should be 22050