Two A100 GPUs for fine-tuning, insufficient memory?

Question

Two A100 GPUs for fine-tuning, insufficient memory?

hekaijie123 opened this issue 5 months ago · comments

Hello Author, I am using two A100 GPUs (40GB each) to run full-parameter fine-tuning. Regardless of whether I use the zero2 or zero3 configuration, it always shows that the memory is exceeded. However, according to the "Model Fine-tuning Memory Usage Statistics" table you provided below, GPUs2 uses 16GB of memory. How can this be resolved?

qianyu chen · Answer 1 · Thu Jun 13 2024 21:12:27 GMT+0800 (China Standard Time)

have you set offload the model‘s params and optimizer to cpu？

1SingleFeng · Answer 2 · Fri Jun 14 2024 14:14:50 GMT+0800 (China Standard Time)

You need to offload model parameters and optimizer parameters to the CPU, further reducing GPU memory usage:

1SingleFeng · Answer 3 · Fri Jun 14 2024 14:15:28 GMT+0800 (China Standard Time)

"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
}
}