yikang-li / iQAN

Visaul Question Generation as Dual Task of Visual Question Answering (PyTorch Version)

Home Page:http://cvboy.com/publication/cvpr2018_iqan/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Too much cache when training.

jingchenchen opened this issue · comments

Since the features are stored in a huge .h5 file (124GB), it takes too much cache memory to train the model. It seems that the opened .h5 file would not free the memory. If a sample is read, it will be in the cache until the file is closed. In fact, this also leads to low efficiency (sever hours for one epoch), since free memory is really limited. .How to handle this problem? I try to clear the cache but failed. The cache only can be cleared when the h5 file is closed. Since the file is too huge, it's not realistic to put it in memory (load the whole file) or cache.
image

Actually, I train the model on a server with 512GB Mem. In the first time you loading the data, it takes a long time (like you said, hours to take to load the data). But after that, if your memory is large enough, it takes much less time to do that (no more than 1 sec for one iter).