error for single GPU training

Question

error for single GPU training

zqyJason opened this issue a year ago · comments

zqyJason commented a year ago

Before Asking

I have read the README carefully. 我已经仔细阅读了README上的操作指引。
I want to train my custom dataset, and I have read the tutorials for finetune on your data carefully and organize my dataset correctly; 我想训练自定义数据集，我已经仔细阅读了训练自定义数据的教程，以及按照正确的目录结构存放数据集。
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码，重新运行之后，问题仍不能解决。

Search before asking

I have searched the DAMO-YOLO issues and found no similar questions.

Question

When I train damoyolo_tinynasL35_M on my custom dataset on single GPU(Tesla T4), there always throw this error:

RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

but actually my GPU has no processes occupied.

Additional

No response

Weihua Chen · Answer 1 · Tue Feb 28 2023 18:04:05 GMT+0800 (China Standard Time)

This error may be caused by insufficient cuda memory. You can reduce the batch size to see if it works

zqyJason · Answer 2 · Wed Mar 01 2023 10:00:13 GMT+0800 (China Standard Time)

This error may be caused by insufficient cuda memory. You can reduce the batch size to see if it works

I have tried it, but it doesn't work. It's really weird.