fyu / dilation

Dilated Convolution for Semantic Image Segmentation

Home Page:https://www.vis.xyz/pub/dilation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPU memory / joint training

Timo-hab opened this issue · comments

I am trying to reproduce the results for the cityscapes dataset. I am now at the joint training and in the paper says the crop size was 1396x1396 px (half image + label margin)(batch size =1). Surprisingly, this exceeds 12 GB GPU memory, so i cant start the training. 1180x1180px is the max value, already running it in virtual console.

Can that be related to the cudnn version? I tried version 2, 3, 4, 5 and 5.1 and could not observe a different memory requirement. In all cases 1180x1180px were the max.
Thats kind of strange, isnt it? I would suspect a difference when using different cudnn versions.
Could you please give me some advice? Thank you in advance!

I have a similar issue. I can only run the program in CPU mode because it requires about 18 Gb in memory.

@Timo-hab @chenqifeng22 test_batch has to be 0 for cityscape joint training. Was it set correctly?

I only tried predict.py for a single image. I get "out of memory" error with Titan X.

@chenqifeng22 You have to check your setup. A lot of people have confirmed to me that predict.py can work well.

@fyu I need to install cuDNN to make it work.

@fyu thank you for your answer. test_batch wasnt set to 0. Now i can start the training, but dont see test results of course.
Is it possible to test the model on validation dataset while training, so i can see accuracy and loss, without the need for more GPU memory?

I also started to play around with the code a bit and wanted to test it on the cityscapes testimage. Not training, just the predict.py.
And it shows me there out of memory when I try it on my gpu with --gpu 0. Did you guys find a solution for it? Is really 18GB of GPU memory required for it? If so - is there any possibility to reduce the memory footprint at testing time? I use cudnn v5.1 and CUDA 8.0.

By the way: It works on KITTI with GPU and on cityscapes with CPU only.

@manuelschmidt Depending on the model of GPU, the dilation network needs different amount of memory (which is quite strange to me).
There is a simple solution. Just make the input image size smaller in the prototype file. Also you need to change " if dataset.zoom > 1:" to "if dataset.zoom >= 1:" in the predict.py to prevent a bug.

commented

@fyu I am training the ADE20K dataset on the Dilation network with the frontend mode in GPU with 8 GB memory and I get the error message 'out of memory'. Is 8 GB enough for the training?
My training batch is set as 10 and test batch 0.
Thank you

@qualia0000 Besides the batch size, the memory consumption also depends on the crop size.

commented

@fyu Thank you for your reply. The crop size is set as 500. I'll try methods to make the consumption smaller.