Frankluox / LightningFSL

LightningFSL: Pytorch-Lightning implementations of Few-Shot Learning models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems occured when reimplement COSOC

Taylorfire opened this issue · comments

commented

Hi, when i tried to reimplement COSOC, I was confronted with two problems:

  1. Multi-GPU training: I followed the guidance "4. Training COSOC" and finished the training of examplar and running of COS algorithm. However, when I tried to use 2 TITAN V(12G) GPUs to running FSL algorithm with COS, it would cause error "CUDA out of memory". More exactly, as long as validation went on, it would cause such a problem. (When training was on but validation hadn't start it went on normally.)
    I didn't modify any hyperparameters, the batchsize in this stage is still 128.
    1
    error-2

  2. Training with single GPU and smaller batchsize: Given the problem in 1, I also tried training on single GPU with batchsize=32(the max supportable bs). But the validation results seemed as if nothing had been learned (36/60 epochs):
    error-3

It would be very thankful for your reply!

Hi, thanks for reporting the bug! It is now fixed, please try again. (modification of set_config_COSOC.py and SOC.py. I change the val shot, the learning rate and the epoch number, and also add a plugin that removes the warning. A bug in SOC.py is fixed.)

commented

Hi, thanks for reporting the bug! It is now fixed, please try again. (modification of set_config_COSOC.py and SOC.py. I change the val shot, the learning rate and the epoch number, and also add a plugin that removes the warning. A bug in SOC.py is fixed.)

Thanks for your responsible reply and solution! It now can be sucessfully trained with multi-gpus. I am waiting for a reasonable result and this may cost a period of time. Hope you would not mind my keeping this issue open for some while.