validation acc in the pretrain phase in pytorch

Question

validation acc in the pretrain phase in pytorch

Sword-keeper opened this issue 4 years ago · comments

I renumber you supported your best pretrained model in a issue. And its validation acc is 64%. I want to modify your backbone. However, the best val acc in my pretrain phase is 41%. And I rerun your pretrain code. I found the best val acc is 48%. So, did you have some tricks when you pretrained the model?

Yaoyao Liu · Answer 1 · Wed Jun 17 2020 17:14:22 GMT+0800 (China Standard Time)

That model is trained using exactly the same code in the GitHub repository.

Please provide me with more information so that I might give you further suggestions. E.g., how you process the dataset, and what is your PyTorch version.

Sword-keeper · Answer 2 · Wed Jun 17 2020 17:45:28 GMT+0800 (China Standard Time)

Firstly, when I use your supported 'max_acc.pth‘ to run your meta phase, it will out of memory in the val phase. When i added 'with torch.no grad()'. It can run smoothly. What's more, the test acc is equal to your result. It(out of memory) also happened in the pre-val phase when I run the pretrain phase. So I just changed your preval forward code like this.

Yaoyao Liu · Answer 3 · Wed Jun 17 2020 17:50:39 GMT+0800 (China Standard Time)

You should not add with torch.no grad() as we need to calculate the gradients with torch.autograd.grad.

May I know what GPU you’re using?

Sword-keeper · Answer 4 · Wed Jun 17 2020 17:55:00 GMT+0800 (China Standard Time)

my torch version is 1.3.1 and data preprocess is same to you

Sword-keeper · Answer 5 · Wed Jun 17 2020 17:59:24 GMT+0800 (China Standard Time)

my gpu is gtx 2080, 8g
I put the this part(calculate the gradients with torch.autograd.grad.) in the optimize_base() part. And it is before the with torch.no grad(). Did i do something wrong?

Yaoyao Liu · Answer 6 · Wed Jun 17 2020 18:31:07 GMT+0800 (China Standard Time)

In your screenshot, you use a function named self.base. I guess it is a function added by you. Could you please provide me with the details of that function?

Other parts of your code look correct. If you cannot use meta validation during the pre-training phase, you may use a normal validation for 64 classes instead. You may also try the pre-training code in DeepEMD and FEAT. We're using the same pre-training strategy.

Sword-keeper · Answer 7 · Wed Jun 17 2020 18:47:01 GMT+0800 (China Standard Time)

oh self.base is baselearner in your code. I will rerun this code once more. And try other ways. Thank you!

Yaoyao Liu · Answer 8 · Wed Jun 17 2020 20:48:39 GMT+0800 (China Standard Time)

It seems your change is correct. I am not sure what makes your pre-training accuracy lower than excepted. It should be around 60% for meta validation after pre-training. I'll check the related code to find if there is any bug.

I also suggest you run exactly the same code using our config (PyTorch 0.4.0) if it is possible. You may also try the other two methods I mentioned. They all provide the pre-training code.

Sword-keeper · Answer 9 · Wed Jun 17 2020 21:08:18 GMT+0800 (China Standard Time)

When I use rtx2080 run your code in torch 0.4.0, there were some bugs in the baselearner.
net = F.linear(input_x, fc1_w, fc1_b) the bugs shows cublas runtime error: the GPU program failed to execute.
I tried to fix it in many ways but failed. However, when I run your code in rtx1060, it succesed. So i updated the pytorch, and it can run again. Maybe there are something wrong between rtx2080, cuda version and torch version. If someone also have this problem, you can tell them to change the gpu or torch version or cuda version.

Yaoyao Liu · Answer 10 · Wed Jun 17 2020 21:58:15 GMT+0800 (China Standard Time)

Thanks for reporting this issue.