got Out of GPU memory when learning

Question

A-Cepheus opened this issue a year ago · comments

It seems like one iteration can be completed, but OOM occurred during the second iteration, any idea?

A-Cepheus · Answer 1 · Fri Sep 08 2023 10:57:50 GMT+0800 (China Standard Time)

Maybe I should continue reduce batch size?

Jonathan Laurent · Answer 2 · Fri Sep 08 2023 17:33:32 GMT+0800 (China Standard Time)

You should probably reduce batch size indeed.

A-Cepheus · Answer 3 · Sat Sep 09 2023 21:55:21 GMT+0800 (China Standard Time)

now got a new error

Jonathan Laurent · Answer 4 · Sat Sep 09 2023 22:52:43 GMT+0800 (China Standard Time)

Out of memory errors are often shown as other errors. I would reduce the batch size and/or network size even further.

A-Cepheus · Answer 5 · Fri Sep 15 2023 11:39:27 GMT+0800 (China Standard Time)

I feel that the problem with OOM is indeed accompanied by mem_ buff Appearing as the size increases.

A-Cepheus · Answer 6 · Mon Oct 02 2023 22:59:04 GMT+0800 (China Standard Time)

This is a possible reason that I am researching: FluxML/FluxTraining.jl#148