error during compile

Question

error during compile

PraveenMNaik opened this issue a year ago · comments

Following error observed during compile, can you please address
Traceback (most recent call last):
File "/home3/REPOSITORIES/flexible-yolov5-2/scripts/train.py", line 39, in
import eval # for end-of-epoch mAP
File "/home3/REPOSITORIES/flexible-yolov5-2/scripts/eval.py", line 26, in
from od.models.modules.experimental import attempt_load
ModuleNotFoundError: No module named 'od'

Bobo~ · Answer 1 · Thu Jun 15 2023 18:13:50 GMT+0800 (China Standard Time)

@PraveenMNaik hi, you should cd in /home3/REPOSITORIES/flexible-yolov5-2, and then run command

PraveenMNaik · Answer 2 · Thu Jun 15 2023 18:50:22 GMT+0800 (China Standard Time)

Thank you... agai followig error:
Traceback (most recent call last):
File "scripts/train.py", line 24, in
from od.models.modules.experimental import attempt_load
File "/home3/207it001/REPOSITORIES/yolov5-flexible/od/init.py", line 1, in
from .models import Model, ComputeLoss
File "/home3/207it001/REPOSITORIES/yolov5-flexible/od/models/init.py", line 1, in
from .model import Model
File "/home3/207it001/REPOSITORIES/yolov5-flexible/od/models/model.py", line 2, in
from addict import Dict
ModuleNotFoundError: No module named 'addict'

please reply

Bobo~ · Answer 3 · Fri Jun 16 2023 11:09:05 GMT+0800 (China Standard Time)

pip install addict

PraveenMNaik · Answer 4 · Tue Jun 20 2023 14:19:11 GMT+0800 (China Standard Time)

Sir, i could be able to fix all issues .... but struck with the following issue... pls help resolving it...

RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

But, my GPU is quite good to handle ... i even tried for 1 batch size.... pls suggest.... however VGG and Mobilenet worked when PANET was kept ... why does it consume so much GPU....

Bobo~ · Answer 5 · Tue Jun 20 2023 16:56:19 GMT+0800 (China Standard Time)

Sir, i could be able to fix all issues .... but struck with the following issue... pls help resolving it...

RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

But, my GPU is quite good to handle ... i even tried for 1 batch size.... pls suggest.... however VGG and Mobilenet worked when PANET was kept ... why does it consume so much GPU....

i need know your model config

PraveenMNaik · Answer 6 · Tue Jun 20 2023 18:10:23 GMT+0800 (China Standard Time)

Sir, as suggested in Readme file i used the following configuration for run

python scripts/train.py --batch 16 --epochs 5 --data configs/data.yaml --cfg configs/model_XXX.yaml

Replaced data.yaml with my custom data file.... and experimented with all models defined under config/ <<vgg, mobilenet>> etc.. i didnt change anything

Bobo~ · Answer 7 · Wed Jun 21 2023 10:17:26 GMT+0800 (China Standard Time)

But, my GPU is quite good to handle ... i even tried for 1 batch size.... pls suggest.... however VGG and Mobilenet worked when PANET was kept ... why does it consume so much GPU....

do you mean vgg and mobilenet can work good? I want to konw which model casue CUDA error: out of memory. transformer backbone or gnn?

PraveenMNaik · Answer 8 · Wed Jun 21 2023 21:09:14 GMT+0800 (China Standard Time)

@Bobo-y yeah...for vgg and mobilenet no problem it runs fine.... however for rest of the models defined in your repo it crashes giving CUDA error. I found to have more GPU is considered for vgg and mobilenet as well.... kindly suggest if any modifications needed to updtae before train...I didn't use transformer backbone or gnn as well...

Bobo~ · Answer 9 · Thu Jun 22 2023 11:16:40 GMT+0800 (China Standard Time)

@Bobo-y yeah...for vgg and mobilenet no problem it runs fine.... however for rest of the models defined in your repo it crashes giving CUDA error. I found to have more GPU is considered for vgg and mobilenet as well.... kindly suggest if any modifications needed to updtae before train...I didn't use transformer backbone or gnn as well...

It's not easy to find a problem with this. What type of your GPU? Normally, yolo-s and resnet-18 should be able to run

PraveenMNaik · Answer 10 · Thu Jun 22 2023 12:24:48 GMT+0800 (China Standard Time)

my GPU is NVIDIA Tesla M40 GPUs with GM200 graphics processors....
i just used this configuration
python scripts/train.py --batch 16 --epochs 5 --data configs/data.yaml --cfg configs/model_XXX.yaml... is there any changes to be done before run

Bobo~ · Answer 11 · Thu Jul 13 2023 17:20:32 GMT+0800 (China Standard Time)

i think this error caused by your env, may be gpu