60+ gigs of RAM needed?

Question

60+ gigs of RAM needed?

shubhvachher opened this issue 6 years ago · comments

So, after #91 and #92 and setting --gpu flag as in here

Is this usage normal?!

The command I ran is :
CUDA_VISIBLE_DEVICES=4,9 th train.lua -data ./datasets/ -style_image ./data/textures/862.png -style_size 600 -image_size 512 -model johnson -batch_size 4 -learning_rate 1e-2 -style_weight 10 -style_layers relu1_2,relu2_2,relu3_2,relu4_2 -content_layers relu4_2 -gpu 1

nvidia-smi shows about 800gigs + 350gigs allotted on the 2 GPU cards I made visible, but I'm unsure if they are being used. The CPU usage as can be seen is 400 percent.

What is most shocking is the RAM usage. 60+gigs?! Is this normal? My train/dummy folder has 15 images and val/dummy has 5 images; although I think the RAM usage is independent of this number... Any guidance is appreciated.

Shubh Vachher · Answer 1 · Fri May 04 2018 19:56:47 GMT+0800 (China Standard Time)

My program got terminated due to lack of memory available on the machine. And I think this happened after it ran for more than 15 hours. Any pointers @DmitryUlyanov. TIA!

Shubh Vachher · Answer 2 · Fri May 04 2018 20:40:30 GMT+0800 (China Standard Time)

After a few well placed print statements, I've isolated the problem to train.lua line 2014 :
local images = trainLoader:get()

I'm quite new to torch. Any suggestions?

Shubh Vachher · Answer 3 · Sat May 05 2018 00:19:05 GMT+0800 (China Standard Time)

The problem is with image_size parameter. It needs to be smaller than or equal to the smaller dimension of the training images. eg. if image is (480, 640) then -image_size 480 Else the program gets stuck in the dataloader.lua file in the while true do loop. Such a pain... Closing issue.