c++ sample code for trt inference

Question

c++ sample code for trt inference

kaishijeng opened this issue 2 years ago · comments

kaishijeng commented 2 years ago

I am looking for C++ sample code for TRT inference. Does anyone know where I can find it?

Thanks,

Robin Atom Dulal commented 2 years ago

Ok thanks

Bobo~ · Answer 1 · Thu Jun 23 2022 09:11:33 GMT+0800 (China Standard Time)

@kaishijeng ok, later

fjremnav · Answer 2 · Thu Jun 23 2022 11:40:26 GMT+0800 (China Standard Time)

@Bobo-y

Thanks for the help.

fjremnav · Answer 3 · Sun Jul 03 2022 00:25:07 GMT+0800 (China Standard Time)

@Bobo-y

Is C++ code completed? If yes, where can I download it?

Thanks,

Robin Atom Dulal · Answer 4 · Sun Jul 03 2022 10:30:21 GMT+0800 (China Standard Time)

Error since yesterday's update.

Robin Atom Dulal · Answer 5 · Sun Jul 03 2022 13:52:42 GMT+0800 (China Standard Time)

Robin Atom Dulal commented 2 years ago

Bobo~ · Answer 6 · Sun Jul 03 2022 14:12:48 GMT+0800 (China Standard Time)

@fjremnav sorry, not yet

Bobo~ · Answer 7 · Sun Jul 03 2022 14:13:46 GMT+0800 (China Standard Time)

Error since yesterday's update.

this may be your batchsize too big or network version too big,

Bobo~ · Answer 8 · Sun Jul 03 2022 14:15:56 GMT+0800 (China Standard Time)

Do not use new code to load the previous version model weights, their FPN and PAN nodes are different.

Robin Atom Dulal · Answer 9 · Sun Jul 03 2022 14:27:58 GMT+0800 (China Standard Time)

Error since yesterday's update.

this may be your batchsize too big or network version too big,

I used batch size of 4 and epochs 5, but still getting the cuda out of memory issue. But it was running smoothly on yesterday.

Robin Atom Dulal · Answer 10 · Sun Jul 03 2022 14:32:58 GMT+0800 (China Standard Time)

Can I get previous version. If yes can you mail me at dulalatom@gmail.com.
Thanks

Bobo~ · Answer 11 · Sun Jul 03 2022 14:39:44 GMT+0800 (China Standard Time)

Can I get previous version. If yes can you mail me at dulalatom@gmail.com. Thanks

in branch main or release v1.0.0. Due to a large number of updates, I only verified the original yolov5 backbone, and I am in the process of verifying the others.

Bobo~ · Answer 12 · Sun Jul 03 2022 16:29:09 GMT+0800 (China Standard Time)

Error since yesterday's update.

this may be your batchsize too big or network version too big,

I used batch size of 4 and epochs 5, but still getting the cuda out of memory issue. But it was running smoothly on yesterday.

I train efficientnet B1, No problems found

Robin Atom Dulal · Answer 13 · Sun Jul 03 2022 19:52:06 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

Bobo~ · Answer 14 · Sun Jul 03 2022 19:56:39 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

please give me your model config

Robin Atom Dulal · Answer 15 · Sun Jul 03 2022 20:10:56 GMT+0800 (China Standard Time)

I have 22 classes

Robin Atom Dulal · Answer 16 · Sun Jul 03 2022 20:13:40 GMT+0800 (China Standard Time)

and this is for efficientnet b7. I experienced cuda out of memory. But when I trained yesterday this time, it was working properly

Starting training for 5 epochs...

 Epoch   gpu_mem       box       obj       cls    labels  img_size

0% 0/28 [00:00<?, ?it/s]
Traceback (most recent call last):
File "scripts/train.py", line 665, in
main(opt)
File "scripts/train.py", line 561, in main
train(opt.hyp, opt, device, callbacks)
File "scripts/train.py", line 352, in train
pred = model(imgs) # forward
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "./od/models/model.py", line 67, in forward
out = self.backbone(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "./od/models/backbone/efficientnet/model.py", line 314, in forward
x = self.extract_endpoints(inputs)
File "./od/models/backbone/efficientnet/model.py", line 263, in extract_endpoints
x = block(x, drop_connect_rate=drop_connect_rate)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "./od/models/backbone/efficientnet/model.py", line 105, in forward
x = self._bn0(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/batchnorm.py", line 179, in forward
self.eps,
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2422, in batch_norm
input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 450.00 MiB (GPU 0; 14.76 GiB total capacity; 12.60 GiB already allocated; 399.75 MiB free; 12.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb:
wandb: Synced golden-hill-40: https://wandb.ai/ratom01/YOLOv5/runs/1c1fw4pt
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20220703_121904-1c1fw4pt/logs

Bobo~ · Answer 17 · Sun Jul 03 2022 20:57:58 GMT+0800 (China Standard Time)

I have 22 classes

there are some problems with hrnet in branch v2, so don't use hrnet for now, let me fix it. and for efficientnet b7, I use coco2017 80 classes, batchsize 4, on A100, use 20.7G gpu_mem, so your gpu can't support it

Robin Atom Dulal · Answer 18 · Sun Jul 03 2022 21:00:50 GMT+0800 (China Standard Time)

thank you for the help

Bobo~ · Answer 19 · Sun Jul 03 2022 21:27:39 GMT+0800 (China Standard Time)

thank you for the help

indeed, the b7 in branch main take 10G gpu_mem, but the old FPN and PAN have some architectural Issues（It is not exactly the same as the original yolov5, but it does not affect the use.）, I think the code
in v2 should be correct, and I will further verify it recently.

Robin Atom Dulal · Answer 20 · Mon Jul 04 2022 08:04:37 GMT+0800 (China Standard Time)

Thanks. I am using colab GPU as it has limited memory. But day before yesterday, I used the main branch and trained b1 and forget to save the weights. I would like to run all backbones and learning to code. Not an expert level. Anyway thanks for your help.

Bobo~ · Answer 21 · Mon Jul 04 2022 08:52:00 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

Bobo~ · Answer 22 · Mon Jul 04 2022 08:58:43 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

There is no problem when using ‘’‘ python scripts.eval.py ’‘’directly, and there will be problems when calling it through scripts/train.py.

Robin Atom Dulal · Answer 23 · Mon Jul 04 2022 08:59:22 GMT+0800 (China Standard Time)

I also found error in mdoel.py and Shufflenetv2.py with from
torchvision.models.utils import load_state_dict_from_url
this is solved by when replacing with from torch.hub import load_state_dict_from_url

Bobo~ · Answer 24 · Mon Jul 04 2022 09:00:49 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

There is no problem when using ‘’‘ python scripts.eval.py ’‘’directly, and there will be problems when calling it through scripts/train.py. for temporary， so you can add. --noval in train command, after training, use python scripts/eval.py

Bobo~ · Answer 25 · Mon Jul 04 2022 10:46:48 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

There is no problem when using ‘’‘ python scripts.eval.py ’‘’directly, and there will be problems when calling it through scripts/train.py. for temporary， so you can add. --noval in train command, after training, use python scripts/eval.py

Fix, Done!

Robin Atom Dulal · Answer 26 · Mon Jul 04 2022 20:36:50 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

There is no problem when using ‘’‘ python scripts.eval.py ’‘’directly, and there will be problems when calling it through scripts/train.py. for temporary， so you can add. --noval in train command, after training, use python scripts/eval.py

Fix, Done!

I tried swin and got this error. Can you fix it please.

Bobo~ · Answer 27 · Mon Jul 04 2022 20:45:31 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

There is no problem when using ‘’‘ python scripts.eval.py ’‘’directly, and there will be problems when calling it through scripts/train.py. for temporary， so you can add. --noval in train command, after training, use python scripts/eval.py

Fix, Done!

I tried swin and got this error. Can you fix it please.

set half=False in eval.py

Robin Atom Dulal · Answer 28 · Mon Jul 04 2022 20:47:27 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

There is no problem when using ‘’‘ python scripts.eval.py ’‘’directly, and there will be problems when calling it through scripts/train.py. for temporary， so you can add. --noval in train command, after training, use python scripts/eval.py

Fix, Done!

I tried swin and got this error. Can you fix it please.

set half=False in eval.py

I will try to run all backbones, and will comment if found some errors. Thanks

Bobo~ · Answer 29 · Mon Jul 04 2022 20:48:03 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

There is no problem when using ‘’‘ python scripts.eval.py ’‘’directly, and there will be problems when calling it through scripts/train.py. for temporary， so you can add. --noval in train command, after training, use python scripts/eval.py

Fix, Done!

I tried swin and got this error. Can you fix it please.

set half=False in eval.py
last time, open a new issue, please, 100 issue is coming! hahaha

Robin Atom Dulal · Answer 30 · Mon Jul 04 2022 21:43:54 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

There is no problem when using ‘’‘ python scripts.eval.py ’‘’directly, and there will be problems when calling it through scripts/train.py. for temporary， so you can add. --noval in train command, after training, use python scripts/eval.py

Fix, Done!

I tried swin and got this error. Can you fix it please.

set half=False in eval.py
last time, open a new issue, please, 100 issue is coming! hahaha

I tried setting half =False. Still not fixed. Can you fix it please

Bobo~ · Answer 31 · Tue Jul 05 2022 09:12:42 GMT+0800 (China Standard Time)

I don't know why, I am using 4 batch size and for 5 epochs. In this case also I got the error like this. Any suggestions to fix the error please

for this error, i know why, yhe latest eval script has some bug, the eval script in main run succ

There is no problem when using ‘’‘ python scripts.eval.py ’‘’directly, and there will be problems when calling it through scripts/train.py. for temporary， so you can add. --noval in train command, after training, use python scripts/eval.py

Fix, Done!

I tried swin and got this error. Can you fix it please.

set half=False in eval.py
last time, open a new issue, please, 100 issue is coming! hahaha

I tried setting half =False. Still not fixed. Can you fix it please

That's strange. I tested swin tiny and small, both of which are normal

Bobo~ · Answer 32 · Tue Jul 05 2022 22:58:34 GMT+0800 (China Standard Time)

and this is for efficientnet b7. I experienced cuda out of memory. But when I trained yesterday this time, it was working properly Starting training for 5 epochs...
 Epoch   gpu_mem       box       obj       cls    labels  img_size
0% 0/28 [00:00<?, ?it/s] Traceback (most recent call last): File "scripts/train.py", line 665, in main(opt) File "scripts/train.py", line 561, in main train(opt.hyp, opt, device, callbacks) File "scripts/train.py", line 352, in train pred = model(imgs) # forward File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "./od/models/model.py", line 67, in forward out = self.backbone(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "./od/models/backbone/efficientnet/model.py", line 314, in forward x = self.extract_endpoints(inputs) File "./od/models/backbone/efficientnet/model.py", line 263, in extract_endpoints x = block(x, drop_connect_rate=drop_connect_rate) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "./od/models/backbone/efficientnet/model.py", line 105, in forward x = self._bn0(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/batchnorm.py", line 179, in forward self.eps, File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2422, in batch_norm input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 450.00 MiB (GPU 0; 14.76 GiB total capacity; 12.60 GiB already allocated; 399.75 MiB free; 12.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. wandb: wandb: Synced golden-hill-40: https://wandb.ai/ratom01/YOLOv5/runs/1c1fw4pt wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20220703_121904-1c1fw4pt/logs

now, you can train the b7 model,please pull new code, I fixed the PAN outputs channel, now it's same as U-yolov5

Bobo~ · Answer 33 · Fri Jul 08 2022 16:45:52 GMT+0800 (China Standard Time)

I am looking for C++ sample code for TRT inference. Does anyone know where I can find it?

Thanks,

Done, see scripts/trt_infer/cpp

Bobo~ · Answer 34 · Fri Jul 08 2022 16:46:19 GMT+0800 (China Standard Time)

@Bobo-y

Is C++ code completed? If yes, where can I download it?

Thanks,

Done, see scripts/trt_infer/cpp