Stability-AI / stable-audio-tools

Generative models for conditional audio generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is the recommended format of the Audio files used for training?

CreateTheImaginable opened this issue · comments

What Audio format does Stable Audio use for training? More importantly what Audio file format is recommended and what are its Audio file settings and/or parameters?

for proper training you would probably want to stick to WAVs since some mp3s have issues being read but your data might be totally fine.

you would want full quality 16bit lossless WAV files for the best compatibility. The sample rate is configurable per model; all data will be resampled to this sample rate regardless

@diontimmer great info! It might be good to post some guidelines to use the WAV file type and to use 16bit lossless WAV files for best compatibility and that the sample rate is configurable.

Is there an optimum sample rate?

I think it would be good to establish a baseline that could maybe possibly be the standard? I am sure people will play around and do optimizations but it might be good to have a baseline for comparison?

yes, most modern audio is in either 44100 or 48000 sample-rate (hz). i personally prefer 44100 for the longer length; the quality difference is not noticable to me