멀티 GPU를 사용하려면 파라미터가 따로 있나요 ?
SongDongKuk opened this issue · comments
SongDongKuk commented
nn.DataParallel을 사용해도 model이 특정 GPU에만 할당이 되서 CUDA out of memory를 내뱉고 있습니다 ㅠㅠ
방법이 있는지 확인좀 부탁드릴게요 !
Jonghwan Mun commented
We recommend to use nn.parallel.DistributedDataParallel
which more efficiently manges memory.
(see link)
Junbum Cha commented
The evaluation code and script we provided are designed to utilize all available GPUs. Please ensure that you have correctly used the evaluation script (especially torchrun --nproc_per_node=auto
part) as your first step. If the issue persists with this script, consider manually setting the --nproc_per_node
to match the number of GPUs available.