what optimization strategy is used?

Question

g-h-chen opened this issue 5 months ago · comments

Hi Rongsheng,
Thanks for your work! I'm wondering what optimization strategy is used (ZERO-1/2/3)?

g-h-chen · Answer 1 · Thu Jan 04 2024 22:26:14 GMT+0800 (China Standard Time)

Also, can you reveal how many GPU hours you used in your training?

WangRongsheng · Answer 2 · Thu Jan 04 2024 22:29:06 GMT+0800 (China Standard Time)

Hi Rongsheng, Thanks for your work! I'm wondering what optimization strategy is used (ZERO-1/2/3)?

WangRongsheng · Answer 3 · Thu Jan 04 2024 22:31:29 GMT+0800 (China Standard Time)

Also, can you reveal how many GPU hours you used in your training?

g-h-chen · Answer 4 · Thu Jan 04 2024 22:35:48 GMT+0800 (China Standard Time)

Thanks for your reply. I read the source code but found no sign of using any of the ZERO optimization. Is this the case? Did I miss anything?

WangRongsheng · Answer 5 · Thu Jan 04 2024 22:41:29 GMT+0800 (China Standard Time)

Thanks for your reply. I read the source code but found no sign of using any of the ZERO optimization. Is this the case? Did I miss anything?

we support Deepspeed Zero, but we don't use it. we will update readme and you can check it later.

g-h-chen · Answer 6 · Thu Jan 04 2024 22:47:57 GMT+0800 (China Standard Time)

Roger that! Thanks!