梯度叠加

Question

ShawnChang-ei opened this issue 8 months ago · comments

关于梯度叠加的代码,有个疑问
model.step()不应该在if (step + 1) % args.gradient_accumulation_steps == 0:之后吗?这样梯度叠加才会起效果吧

Rooders · Answer 1 · Thu Nov 16 2023 17:55:04 GMT+0800 (China Standard Time)

我也有这个疑问，你实验过有区别吗？

logCong · Answer 2 · Tue Dec 12 2023 23:29:14 GMT+0800 (China Standard Time)

deepseed里面做好的