Deepspeed and training code print different throughput

Question

Deepspeed and training code print different throughput

wintersurvival opened this issue 2 years ago · comments

When training with 8 GPU, the throughput printed by Deepspeed is much smaller than throughput calculated by training code:
deepspeed SamplesPerSec=505
sample_per_sec: 50120

It seems that the throughput calculated by training code = throughput printed by Deepspeed * gradient_steps
Which number is accurate? @lucidrains @janEbert

janEbert · Answer 1 · Thu Jan 20 2022 19:00:57 GMT+0800 (China Standard Time)

Hey! @rom1504 implemented that calculation. :)