60+ gigs of RAM needed?
shubhvachher opened this issue · comments
So, after #91 and #92 and setting --gpu flag as in here
Is this usage normal?!
The command I ran is :
CUDA_VISIBLE_DEVICES=4,9 th train.lua -data ./datasets/ -style_image ./data/textures/862.png -style_size 600 -image_size 512 -model johnson -batch_size 4 -learning_rate 1e-2 -style_weight 10 -style_layers relu1_2,relu2_2,relu3_2,relu4_2 -content_layers relu4_2 -gpu 1
nvidia-smi
shows about 800gigs + 350gigs allotted on the 2 GPU cards I made visible, but I'm unsure if they are being used. The CPU usage as can be seen is 400 percent.
What is most shocking is the RAM usage. 60+gigs?! Is this normal? My train/dummy
folder has 15 images and val/dummy
has 5 images; although I think the RAM usage is independent of this number... Any guidance is appreciated.
My program got terminated due to lack of memory available on the machine. And I think this happened after it ran for more than 15 hours. Any pointers @DmitryUlyanov. TIA!
After a few well placed print statements, I've isolated the problem to train.lua line 2014 :
local images = trainLoader:get()
I'm quite new to torch. Any suggestions?
The problem is with image_size parameter. It needs to be smaller than or equal to the smaller dimension of the training images. eg. if image is (480, 640) then -image_size 480
Else the program gets stuck in the dataloader.lua
file in the while true do
loop. Such a pain... Closing issue.