oscarknagg / few-shot

Repository for few-shot learning machine learning projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Saving memory to avoid CUDA OOM with GTX-1080Ti

daisukelab opened this issue · comments

Hi,

Thank you very much for sharing this repository, it helps quick try.
But so far I'm struggling to avoid OOM below.

Is there any clue suppressing memory use?

  • Using Omniglot dataset.
  • Tried DataLoader num_worker=1, but it still shows error.
$ python proto_nets.py --dataset omniglot
omniglot_nt=1_kt=60_qt=5_nv=1_kv=5_qv=1
Indexing background...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19280/19280 [00:00<00:00, 281958.22it/s]
Indexing evaluation...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13180/13180 [00:00<00:00, 302261.60it/s]
Training Prototypical network on omniglot...
Begin training...
Epoch 1:   2%|██▎                                                                                                                  | 2/100 [00:06<06:29,  3.98s/it, loss=57.9, categorical_accuracy=0.35]Traceback (most recent call last):
  File "proto_nets.py", line 129, in <module>
    'distance': args.distance},
  File "/home/me/lab/few-shot/ew-shot/few_shot/train.py", line 113, in fit
    loss, y_pred = fit_function(model, optimiser, loss_fn, x, y, **fit_function_kwargs)
  File "/home/me/lab/few-shot/few_shot/proto.py", line 67, in proto_net_episode
    loss.backward()
  File "/home/me/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/me/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 316.50 MiB (GPU 0; 10.91 GiB total capacity; 9.70 GiB already allocated; 95.38 MiB free; 245.36 MiB cached)

Hi,

I'm sorry to disturb you once again.
It's always like Murphy's law, I could find what was wrong right after writing this issue post...
It was too big --k-train, now I could make it run successfully.

$ python proto_nets.py --dataset omniglot --k-train 5

BTW, I like Keras-like API implementation, it's great. :)

Thanks again.

No worries, let me know if you have any more issues.

I recently made the Keras-like API into the pip package olympic. Docs are here: https://olympic-pytorch.readthedocs.io/en/latest/