does force onnx drop GPU support?

Question

does force onnx drop GPU support?

MyraBaba opened this issue 3 years ago · comments

Hi,

When I force to use onnx I saw model is not using gpu. using only cpu. can it be configurable ?

SthPhoenix · Answer 1 · Mon Jan 03 2022 19:04:01 GMT+0800 (China Standard Time)

Docker images are built with CPU version of onnxruntime. It's intended use case is a fallback when no GPU available.
You can install onnxruntime-gpu, though in it's latest versions you also have to provide GPU executor provider as argument to onnxruntime.Session.
But I highly recommend using TRT for GPU since it's faster than onnxruntime and also there are some optimizations for image preprocessing with TRT backend

MyraBaba · Answer 2 · Mon Jan 03 2022 22:10:07 GMT+0800 (China Standard Time)

is there any speed / accuracy difference between trt and onnx ?

It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).

SthPhoenix · Answer 3 · Mon Jan 03 2022 22:31:42 GMT+0800 (China Standard Time)

is there any speed / accuracy difference between trt and onnx ?

It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).

TRT is significantly faster especially with fp16 inference (force_fp16=True) on GPUs supporting it. There is some accuracy degradation, but embeddings computed with TRT and onnxruntime are usually 0.99 similar.

It's trivial to add support for onnxruntime-gpu, but I'm not sure if it's actually useful, since TRT performs much better, and as I said before there are optimizations in IFR code for TRT inference.

MyraBaba · Answer 4 · Mon Jan 03 2022 23:28:04 GMT+0800 (China Standard Time)

Same accuracy important sometime. If I remove oenxruntime and add oenxruntime-gpu to pip and also change art to onyx is ok for deploy_trt ?

…

On 3 Jan 2022, at 17:31, SthPhoenix ***@***.***> wrote: is there any speed / accuracy difference between trt and onnx ? It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu). TRT is significantly faster especially with fp16 inference (force_fp16=True) on GPUs supporting it. There is some accuracy degradation, but embeddings computed with TRT and onnxruntime are usually 0.99 similar. It's trivial to add support for onnxruntime-gpu, but I'm not sure if it's actually useful, since TRT performs much better, and as I said before there are optimizations in IFR code for TRT inference. — Reply to this email directly, view it on GitHub <#66 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFRZH4JPHPKS6OYOCUF3RDUUGXNVANCNFSM5LE5HNRA>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.

SthPhoenix · Answer 5 · Mon Jan 03 2022 23:35:51 GMT+0800 (China Standard Time)

You should also provide cuda execution provider argument in latest versions of onnxruntime

MyraBaba · Answer 6 · Mon Jan 03 2022 23:45:33 GMT+0800 (China Standard Time)

To where _? Onyx_backend?

…

On 3 Jan 2022, at 18:36, SthPhoenix ***@***.***> wrote: You should also provide cuda execution provider <https://onnxruntime.ai/docs/execution-providers/> argument in latest versions of onnxruntime — Reply to this email directly, view it on GitHub <#66 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFRZH5P2G2BR7CY2QJEQWLUUG66HANCNFSM5LE5HNRA>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.

SthPhoenix · Answer 7 · Mon Jan 03 2022 23:49:59 GMT+0800 (China Standard Time)

To all lines with onnxruntime.InferenceSession in onnxrt_backend.py

MyraBaba · Answer 8 · Tue Jan 04 2022 00:05:43 GMT+0800 (China Standard Time)

done. Onnx-gpu: 34 img/sec trt : 44 img/sec

…

On 3 Jan 2022, at 18:50, SthPhoenix ***@***.***> wrote: To all lines with onnxruntime.InferenceSession in onnxrt_backend.py — Reply to this email directly, view it on GitHub <#66 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFRZH42HFLMON5AFPOMXQTUUHATDANCNFSM5LE5HNRA>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.

SthPhoenix · Answer 9 · Tue Jan 04 2022 00:15:53 GMT+0800 (China Standard Time)

That's pretty slow, what GPU and models parameter have you used?

MyraBaba · Answer 10 · Tue Jan 04 2022 01:34:13 GMT+0800 (China Standard Time)

2080txi Default deploy parameters. 1 worker

…

On 3 Jan 2022, at 19:16, SthPhoenix ***@***.***> wrote: That's pretty slow, what GPU and models parameter have you used? — Reply to this email directly, view it on GitHub <#66 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFRZHZMRLYWUTLS6XQSL3TUUHDUHANCNFSM5LE5HNRA>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.

SthPhoenix · Answer 11 · Tue Jan 04 2022 02:07:02 GMT+0800 (China Standard Time)

Try enabling force_fp16 then, I'm getting around 145-150 img/sec with one worker and 10 client threads with fp16 enabled on rtx2080 super for Stallone.jpg