Is convolution node 486 in Faster R-CNN working fine?
Apisteftos opened this issue · comments
Bug Report
Which model does this pertain to?
Model faster R-CNN Opset 12
Describe the bug
I am doing profiling with faster RCNN and calculating the Throughput in TOPs is 1321 TOPs which is really high over the limits of the NVIDIA A100 GPU. Can somebody explain me if the model works properly?
Reproduction instructions
System Information
OS Platform and Distribution (Linux Ubuntu 22.04):
ONNX version (1.14):
Backend/Runtime version (Onnexruntime 1.15):
Here my profiling data:
FP32
dur: 70
486_kernel_time
output_type_shape: ( 1, 256, 200, 392)
input_type_shape: (1, 256, 200, 392)
kernel_shape : (256, 256, 3, 3)
bias: 256
provider: CUDAExecutionProvider
op_name: Conv
Throughput: 1321 TOPs
Notes
A100 specs is: Peak FP32 TFLOPS (non-Tensor) = 19.5