AMAAI-Lab / mustango

Mustango: Toward Controllable Text-to-Music Generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training device and time

fundwotsai2001 opened this issue · comments

Thank you for open source your implementation, I'm curious about what Training device and how much time is required to train the model.

After installing the required packages, you can run accelerate config from your terminal and set up your run configuration by answering the questions asked. Here, you can specify the devices to use for training.

Then, the accelerate launch command specified for training will run the code on your specified devices.

We trained our various models on 4-8 GPUs. @Dapwner can provide more details about the training time.

Hello! Naturally, training time depends on the effective batch size used and available GPU resources.

In our case, we trained the Tango (not Mustango) baseline models on 4 Nvidia Tesla V100 GPUs (32GB memory) with an effective batch size of 32 (batch=2, devices=4, accumulation steps=4) which resulted in roughly 2 hours for 1 epoch of MusicBench.

For Mustango, since it is bigger, we used 8 RTX8000 and an effective batch size of 32 (batch=2, devices=8, accumulation steps=2), but this time it took longer than expected, which could be due to the use of different GPUs, or other settings and issues. The speed was about 5 hours per 1 epoch of MusicBench.

Training for 20 epochs should give you something sufficient, but we trained for 40 or even more. So you can expect training times between 4 to 10 days for a good output. :-)