In the process of training ABCNetv2, the GPU memory usage gradually increases

Question

In the process of training ABCNetv2, the GPU memory usage gradually increases

TAOSHss opened this issue 2 years ago · comments

I used 4 gpus
SOLVER.IMS_PER_BATCH=4
At the beginning of training, the GPU memory usage was 9000m+, but after 9w iter, the memory usage became uneven, and the memory of a certain card became 15000m+

Have you ever encountered such a situation during your training, and how did you deal with it? Oh, I'm using the docker environment; I looked for related issues in the issues of detectron2, but I didn't get a more effective answer