fine-tuning.

Question

fine-tuning.

HobisPL opened this issue 6 months ago · comments

Can you write more about training, how the dataset should look like, etc.? I see that you are from Poland, do you plan to add more Polish voices? Because the current model struggles with accents and style.

Jakub Piotr Cłapa · Answer 1 · Sun Jan 21 2024 01:39:36 GMT+0800 (China Standard Time)

I don't have more Polish data that is permissively licensed. One thing I am looking forward to is adding more languages – hopefully this would improve performance on all languages, like it did for Whisper.

HobisPL · Answer 2 · Sun Jan 21 2024 17:16:59 GMT+0800 (China Standard Time)

I don't have more Polish data that is permissively licensed. One thing I am looking forward to is adding more languages – hopefully this would improve performance on all languages, like it did for Whisper.

Sure, I understand. Will you provide any instructions on how to do fine-tuning and what the TXT/CSV file should look like? Is this a standard format?
audio_file_name|text|speaker_name
Alternatively, should I create a Google Colab notebook for this?

Stellan Haglund · Answer 3 · Wed Jan 24 2024 00:00:15 GMT+0800 (China Standard Time)

I'm intrested in doing this for swedish i found some audiobooks I could use.
But I would be interested in what kind of hardware it requires, expected time and so on.
Are there any resources on this?

Jakub Piotr Cłapa · Answer 4 · Sun Jan 28 2024 08:15:56 GMT+0800 (China Standard Time)

I am working writing down the full process for data preprocessing. It's a bit involved because we need to scale it for 1000s of hours but for smaller fine-tuning datasets someone should be able to put all of it into a single notebook with reasonable runtime.

Naozumi · Answer 5 · Wed Feb 07 2024 23:49:09 GMT+0800 (China Standard Time)

If I want to add a new language to WhisperSpeech, will fine-tuning archive it? Also, did the audio of the dataset is limited to 1 speaker only? It's difficult to find a dataset with 1000 hours of length with only 1 speaker... If different speakers speak with 1 single language will it work?

Ming-Hsuan-Tu · Answer 6 · Tue Mar 26 2024 14:15:10 GMT+0800 (China Standard Time)

@jpc

Any update on this? how can i fine-tuning if i have a chinese audio dataset?

Naozumi · Answer 7 · Tue Mar 26 2024 14:47:06 GMT+0800 (China Standard Time)

@jpc
Please also tell me the dataset requirement, as I mentioned above, thank you