malllabiisc / RESIDE

EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OOM appears when training the model

jacypne opened this issue · comments

Hi! I have a problom like the issue #18 , and I tried with reduced batch size and dimensions as you answered. I set the batch equal one but it also had same problom, OOM. My graphics card is Nvidia GeForce RTX 2080ti (11G), why is the memory superimposed when the program is running? Moreover, when I set the lstm_dim equal 64 and the batch equal 32, I got result auc: 0.397, it didn't match your paper auc: 0.416 on Riedel. Could you please give any help? The error is shown as follow:
捕获

Hi, can you try running it on CPU to check whether it is working or not?
Use larger LSTM dimension like 128 or 192 for matching the reported performance.

commented

@svjan5 Hi, I am having a same error on Nvidia 1080Ti, it does work on CPU though very slow. Whatever batchsize I try it always runs out of memory. I wonder how we can reproduce the results under these hardware settings?

I have trained all the models on 1080Ti only. It used to take around 8gb of memory. I think the new version of Tensorflow might be demanding some more space. Can you try experimenting with 1.8 version?

commented

I have trained all the models on 1080Ti only. It used to take around 8gb of memory. I think the new version of Tensorflow might be demanding some more space. Can you try experimenting with 1.8 version?

Thanks for the hint, I will try and let you know how it goes. 8gb should be alright, we have the same card with 11gb I believe. The strange thing is that it can always start training normally, with reasonable memory usage, but after some random steps the oom error appears. It seems the cuda memory is stacking up during the training process.

commented

I have trained all the models on 1080Ti only. It used to take around 8gb of memory. I think the new version of Tensorflow might be demanding some more space. Can you try experimenting with 1.8 version?

Hi, we have tried using 1.8, still have the same problem, do you have any other suggestions? could you post a list of the packages you use so we can check?

I am sorry for the late reply. I am able to still run my code without any OOM error. You can look at the output of pip freeze here: https://codebunk.com/b/694343322/