Latency too high with DCNv3_pytorch op on cpu

Question

Latency too high with DCNv3_pytorch op on cpu

xbkaishui opened this issue 9 months ago · comments

Hi guys:

when use DCNv3_pytorch for inference, the letency is too high for cpu only device
I test with both cuda and cpu infer
for cuda device , per image is 29 ms, with DCNV3 version the cost is 20ms
for cpu device, per image is 550ms （only test with pytorch）

do you have any idea to optimize the cpu infer cost？

thanks