what optimization strategy is used?
g-h-chen opened this issue · comments
Hi Rongsheng,
Thanks for your work! I'm wondering what optimization strategy is used (ZERO-1/2/3)?
Also, can you reveal how many GPU hours you used in your training?
Hi Rongsheng, Thanks for your work! I'm wondering what optimization strategy is used (ZERO-1/2/3)?
@g-h-chen Hi, this is helpful for you. https://github.com/WangRongsheng/Aurora?tab=readme-ov-file#train
Also, can you reveal how many GPU hours you used in your training?
We use a single NVIDIA H100. The traing time information is here. https://huggingface.co/wangrongsheng/Aurora/blob/main/train_results.json
Thanks for your reply. I read the source code but found no sign of using any of the ZERO optimization. Is this the case? Did I miss anything?
Thanks for your reply. I read the source code but found no sign of using any of the ZERO optimization. Is this the case? Did I miss anything?
we support Deepspeed Zero, but we don't use it. we will update readme and you can check it later.
Roger that! Thanks!