What is the recommended format of the Audio files used for training?

Question

What is the recommended format of the Audio files used for training?

CreateTheImaginable opened this issue 8 months ago · comments

Create The Imaginable commented 8 months ago

What Audio format does Stable Audio use for training? More importantly what Audio file format is recommended and what are its Audio file settings and/or parameters?

Dion Timmer · Answer 1 · Thu Oct 19 2023 01:14:37 GMT+0800 (China Standard Time)

for proper training you would probably want to stick to WAVs since some mp3s have issues being read but your data might be totally fine.

you would want full quality 16bit lossless WAV files for the best compatibility. The sample rate is configurable per model; all data will be resampled to this sample rate regardless

Create The Imaginable · Answer 2 · Thu Oct 19 2023 07:56:29 GMT+0800 (China Standard Time)

@diontimmer great info! It might be good to post some guidelines to use the WAV file type and to use 16bit lossless WAV files for best compatibility and that the sample rate is configurable.

Is there an optimum sample rate?

I think it would be good to establish a baseline that could maybe possibly be the standard? I am sure people will play around and do optimizations but it might be good to have a baseline for comparison?

Dion Timmer · Answer 3 · Thu Oct 19 2023 10:17:54 GMT+0800 (China Standard Time)

yes, most modern audio is in either 44100 or 48000 sample-rate (hz). i personally prefer 44100 for the longer length; the quality difference is not noticable to me