the efficiency would reduce as the epoch number increases

Question

the efficiency would reduce as the epoch number increases

unluckydan opened this issue 7 years ago · comments

hello, I use 980ti to run your code. I found one thing that I can run this code with 1.44 iter/S at the epoch 0 to 2. But the speed will reduce quickly like 1.4S/iter. As increasing the epoch number, the speed would be worse.

So do you have any clue about this? I am sure that memory is enough to handle this task.

ronekko · Answer 1 · Wed Oct 18 2017 20:02:52 GMT+0800 (China Standard Time)

Hi.

I guess using smaller batch_size (e.g. 100 or 80) may solve your problem, though I'm not sure what is the true cause of inefficiency.

Chainer + cupy have memory pool system that may actually consume much memory than strictly necessary memory allocation, in order to avoid repetitive malloc/free by reusing already allocated memory without freeing if possible.

In my environment (chainer 4.0.0a1, cupy 3.0.0a1), running main_n_pair_mc.py with batch size 120 uses 6.2GB memory.
But gtx980ti has 6GB memory so it perhaps requires reallocation and it makes the training slow.

unluckydan · Answer 2 · Thu Oct 19 2017 00:00:26 GMT+0800 (China Standard Time)

This problem would only happen when I used pycharm. If I use the terminal, nothing bad news. Weird problem. Thanks for your contribution to metric learning. I really like your framework. 2017-10-18 20:02 GMT+08:00 ronekko <notifications@github.com>:

…

Hi. I guess using smaller batch_size <https://github.com/ronekko/deep_metric_learning/blob/9fa438aea16a4e21da015a2a8de9d6bc2bfbead3/main_n_pair_mc.py#L50> (e.g. 100 or 80) may solve your problem, though I'm not sure what is the true cause of inefficiency. Chainer + cupy have memory pool system that may actually consume much memory than strictly necessary memory allocation, in order to avoid repetitive malloc/free by reusing already allocated memory without freeing if possible. In my environment (chainer 4.0.0a1, cupy 3.0.0a1), running main_n_pair_mc.py with batch size 120 uses 6.2GB memory. But gtx980ti has 6GB memory so it perhaps requires reallocation and it makes the training slow. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHRjclI-BFfeHyljRrFbujqBBcO9MCMDks5stejsgaJpZM4P5U_P> .

-- Allen Hao Zhu Research Engineer 3M Cogent Beijing R&D