Cuda out of memory.
kobeshegu opened this issue · comments
Hi, Edward.
Thanks a lot for your excellent work.
I met this problem when I run the code on machine with 3090 20G *1 after training 2000 iterations.
Any idea how to fix it?
I train the model with your recommended setting:
python train.py --conf configs/flower_lofgan.yaml --output_dir results/flower_lofgan --gpu 0
RuntimeError: CUDA out of memory. Tried to allocate 200.00 MiB (GPU 0; 23.70 GiB total capacity; 18.17 GiB already allocated; 120.56 MiB free; 22.21 GiB reserved in total by PyTorch)
To address this issue, I have tried below in the training loop:
- torch.cuda.empty_cache()
- del imgs, lable
- gc.collect()
However, none of them helps.
I also tried to detach() the loss items, still the same issue after 2000 iterations.
Hi @kobeshegu, I'm sorry that the experiments in the paper are conducted on a V100 GPU, so the problem is missed.
Considering the snapshot_val_iter
is set to 2000, I think a possible reason is that the model is evaluated every 2000 iterations. Maybe you can just skip the evaluation period by commenting out the following codes in train.py
.
# if (iterations + 1) % config['snapshot_val_iter'] == 0:
# with torch.no_grad():
# imgs_test = imgs_test.cuda()
# fake_xs = []
# for i in range(config['num_generate']):
# fake_xs.append(trainer.generate(imgs_test).unsqueeze(1))
# fake_xs = torch.cat(fake_xs, dim=1)
# write_image(iterations, image_directory, imgs_test.detach(), fake_xs.detach())
Thanks for your reply, I dont think it happens becaust the eval(), as you use the no.grad() model in the code. Moreover, i stuck at 1400 iterations when change the trainer.cuda() in the loop and detach the loss items:
loss_total = loss_adv_dis_real.detach() + loss_adv_dis_fake.detach() + loss_cls_dis.detach()`
Change in the training loop:
while True:
with torch.autograd.set_detect_anomaly(True):
imgs_test, _ = iter(test_dataloader).next()
trainer = Trainer(config)
iterations = trainer.resume(checkpoint_directory) if args.resume else 0
for it, (imgs, label) in enumerate(train_dataloader):
trainer.cuda()
trainer.update_lr(iterations, max_iter)
imgs = imgs.cuda()
label = label.cuda()
trainer.zero_grad()
trainer.dis_update(imgs, label)
trainer.zero_grad()
trainer.gen_update(imgs, label)
# try:
# trainer.dis_update(imgs, label)
# trainer.gen_update(imgs, label)
# except RuntimeError as exception:
# if "out of memory" in str(exception):
# print("WARNING: out of memory")
# if hasattr(torch.cuda, 'empty_cache'):
# torch.cuda.empty_cache()
# else:
# raise exception
if (iterations + 1) % config['snapshot_log_iter'] == 0:
end = time.time()
print("Iteration: [%06d/%06d], time: %d, loss_adv_dis: %04f, loss_adv_gen: %04f"
% (iterations + 1, max_iter, end-start, trainer.loss_adv_dis, trainer.loss_adv_gen))
write_loss(iterations, trainer, train_writer)
del imgs, label
gc.collect()
torch.cuda.empty_cache()
FYI, the memory of the NVIDIA 3090 GPU I used is 20G, I don`t know how big your V100 is, and I think my hardware should be enough to run the code as I have ran the StyleGAN2 and FUNIT on it.
Looking forward your reply.
Thanks again for your help.
Hi there, I use a 32GB V100 and the memory cost for training is about 21GB. I found the huge memory cost occurs when applying the gradient penalty for the discriminator. I tried to just cancel it and the memory consumption plummeted. But for now, I have no idea how to fix it perfectly. I tried setting inplace=True
for the activations, which reduces about 600MB memory cost, but I'm not sure if it works for you. Or you may have to use a smaller batch size...
Allright then, I`ll try some other ways.
Thanks for your patience^.^
Sorry, it`s me again. Is the gradient panalty makes big difference to the performance? The error no longer shows when I removed the gradient panalty. THX!
how did you deal with it? I have one CUDA with 8GB memory,and i want to run this method. Can you give me some advice about whether it's possible?THX