Saving memory to avoid CUDA OOM with GTX-1080Ti

Question

Saving memory to avoid CUDA OOM with GTX-1080Ti

daisukelab opened this issue 6 years ago · comments

Daisuke Niizumi commented 6 years ago

Hi,

Thank you very much for sharing this repository, it helps quick try.
But so far I'm struggling to avoid OOM below.

Is there any clue suppressing memory use?

Using Omniglot dataset.
Tried DataLoader num_worker=1, but it still shows error.

$ python proto_nets.py --dataset omniglot
omniglot_nt=1_kt=60_qt=5_nv=1_kv=5_qv=1
Indexing background...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19280/19280 [00:00<00:00, 281958.22it/s]
Indexing evaluation...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13180/13180 [00:00<00:00, 302261.60it/s]
Training Prototypical network on omniglot...
Begin training...
Epoch 1:   2%|██▎                                                                                                                  | 2/100 [00:06<06:29,  3.98s/it, loss=57.9, categorical_accuracy=0.35]Traceback (most recent call last):
  File "proto_nets.py", line 129, in <module>
    'distance': args.distance},
  File "/home/me/lab/few-shot/ew-shot/few_shot/train.py", line 113, in fit
    loss, y_pred = fit_function(model, optimiser, loss_fn, x, y, **fit_function_kwargs)
  File "/home/me/lab/few-shot/few_shot/proto.py", line 67, in proto_net_episode
    loss.backward()
  File "/home/me/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/me/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 316.50 MiB (GPU 0; 10.91 GiB total capacity; 9.70 GiB already allocated; 95.38 MiB free; 245.36 MiB cached)

Daisuke Niizumi · Answer 1 · Fri Dec 21 2018 19:38:34 GMT+0800 (China Standard Time)

Hi,

I'm sorry to disturb you once again.
It's always like Murphy's law, I could find what was wrong right after writing this issue post...
It was too big --k-train, now I could make it run successfully.

$ python proto_nets.py --dataset omniglot --k-train 5

BTW, I like Keras-like API implementation, it's great. :)

Thanks again.

Oscar Knagg · Answer 2 · Sun Dec 23 2018 07:28:08 GMT+0800 (China Standard Time)

No worries, let me know if you have any more issues.

I recently made the Keras-like API into the pip package olympic. Docs are here: https://olympic-pytorch.readthedocs.io/en/latest/