Using multiple GPUs vs single GPU

Question

Using multiple GPUs vs single GPU

avinashsai opened this issue 2 years ago · comments

Hi,

Congratulations on the amazing work. Will there be any difference in performance if I use just a single GPU and what are the changes to be made in eg: msvd_qa.json?

Thank you.

Dongxu · Answer 1 · Wed Jun 15 2022 10:55:09 GMT+0800 (China Standard Time)

Using single gpu ends up with smaller number of samples in a batch. So you might need to increase the "gradient_accumulation_steps" to simulate multiple GPUs case. See how it works here. Conceptually this should give you same results after a longer training time, though non-determinism might cause minor performance difference, too.