convert scrfd onnx model to tensorrt

Question

convert scrfd onnx model to tensorrt

saeedkhanehgir opened this issue 2 years ago · comments

Hi
Thanks for sharing this project.

I download scrfd model from link and try to convert it to tensorrt model with /src/converters/modules/converters/onnx_to_trt.py script.

I use a custom convert.py script for this work.

convert.py.txt

I get the below error.

[04/28/2022-13:10:23] [TRT] [E] 4: [network.cpp::validate::3011] Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
Traceback (most recent call last):
  File "convert.py", line 9, in <module>
    convert_onnx(onnx_path,trt_path)
  File "/home/saeed.khanehgir/InsightFace-REST/src/converters/modules/converters/onnx_to_trt.py", line 83, in convert_onnx
    assert not isinstance(engine, type(None))
AssertionError

Thanks

SthPhoenix · Answer 1 · Sat Apr 30 2022 20:10:01 GMT+0800 (China Standard Time)

Hi! src/converters are outdated. You can try checking src/api_trt/modules/model_zoo/getter.py lines 146-163 for you desired use-case.

Something like this should work:

import onnx
from ..converters.onnx_to_trt import convert_onnx
from ..converters.reshape_onnx import reshape, reshape_onnx_input


onnx_path='scrfd_10g_gnkps.onnx'
trt_path='scrfd_10g_gnkps.plan'


model = onnx.load(onnx_path)
onnx_batch_size = 1
height, width = [640,640]
force_fp16=True

reshaped = reshape(model, n=onnx_batch_size, h=height, w=width)
temp_onnx_model = reshaped.SerializeToString()

convert_onnx(temp_onnx_model,
             engine_file_path=trt_path,
             max_batch_size=max_batch_size,
             force_fp16=force_fp16)

(imports are relative to src/api_trt/modules/model_zoo)

Saeed Khanehgir · Answer 2 · Sat May 14 2022 13:21:28 GMT+0800 (China Standard Time)

Thanks @SthPhoenix
for the face embedding model, I use the below code to convert to the fp16 .plan model.

import numpy as np
import onnx
from modules.converters.onnx_to_trt import convert_onnx
from modules.converters.reshape_onnx import  reshape, reshape_onnx_input
# from ..converters.onnx_to_trt import convert_onnx
# from ..converters.reshape_onnx import reshape, reshape_onnx_input
onnx_path = 'w600k_r50.onnx'
trt_path = 'w600k_r50.plan'
model = onnx.load(onnx_path)
onnx_batch_size = 1
reshaped = reshape(model, n=onnx_batch_size, h=112, w=112)
temp_onnx_model = reshaped.SerializeToString()
convert_onnx(temp_onnx_model,
                         engine_file_path=trt_path,
                         max_batch_size=1,
                         force_fp16=True)

and use the below code for inference.

import engine as eng
import tensorrt as trt 
import inference as inf
import cv2 
from PIL import Image
import numpy as np
import skimage.transform
import os 
import numpy as np 
import pycuda.driver as cuda
import pycuda.autoinit

onnx_path='w600k_r50.onnx'
serialized_plan_fp16='w600k_r50.plan'
input_file_path ='2.jpg'
HEIGHT=112
WIDTH=112


def rescale_image(image, output_shape, order=1):
   image = skimage.transform.resize(image, output_shape,
               order=order, preserve_range=True, mode='reflect')
   return image


def l2_normalize(x):
    return x / np.sqrt(np.sum(np.multiply(x, x)))

def load_engine(trt_runtime, plan_path):
   with open(plan_path, 'rb') as f:
       engine_data = f.read()
   engine = trt_runtime.deserialize_cuda_engine(engine_data)
   return engine

def load_images_to_buffer(pics, pagelocked_buffer):

   preprocessed = np.asarray(pics).ravel()
   np.copyto(pagelocked_buffer, preprocessed)


def do_inference(engine, pics_1, h_input_1, d_input_1, h_output, d_output, stream, batch_size):

   """
   This is the function to run the inference
   Args:
      engine : Path to the TensorRT engine. 
      pics_1 : Input images to the model.  
      h_input_1: Input in the host. 
      d_input_1: Input in the device. 
      h_output_1: Output in the host. 
      d_output_1: Output in the device. 
      stream: CUDA stream.
      batch_size : Batch size for execution time.
      height: Height of the output image.
      width: Width of the output image.
   
   Output:
      The list of output images.

   """

   load_images_to_buffer(pics_1, h_input_1)

   with engine.create_execution_context() as context:
       # Transfer input data to the GPU.
       cuda.memcpy_htod_async(d_input_1, h_input_1, stream)

       # Run inference.

       context.profiler = trt.Profiler()
       context.execute(batch_size=1, bindings=[int(d_input_1), int(d_output)])

       # Transfer predictions back from the GPU.
       cuda.memcpy_dtoh_async(h_output, d_output, stream)
       # Synchronize the stream.
       stream.synchronize()
       # Return the host output.
       # out = h_output
       out = h_output.reshape((batch_size,-1))
       return out 



input_file_path='face.jpg'
image = np.asarray(Image.open(input_file_path))
image = rescale_image(image, (112, 112),order=1)
im = np.array(image, dtype=np.float32, order='C')

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)

engine = load_engine(trt_runtime, serialized_plan_fp16)
h_input, d_input, h_output, d_output, stream = inf.allocate_buffers(engine, 1, trt.float32)
out = do_inference(engine, im, h_input, d_input, h_output, d_output, stream, 1)
print('embedding',out[0])

I used this code for face verification but got a bad result.
Do you see something wrong?

SthPhoenix · Answer 3 · Sat May 14 2022 16:18:24 GMT+0800 (China Standard Time)

Hi! First of all face image must be properly aligned 112x112 image from detection step, you can't just take arbitrary image containing face and resize it to 112x112.
Secondly you have no image preprocessing required for inference, in case with w600k model it should be something like this:

img  = cv2.imread("face.jpg", cv2.IMREAD_COLOR)
imgs = [img]

input_size =  (112,112)
input_std = 127.5
input_mean = 127.5
blob = cv2.dnn.blobFromImages(imgs, 1.0 /input_std, input_size,
                                      (input_mean, input_mean, input_mean), swapRB=True)

...

out = do_inference(engine, blob, h_input, d_input, h_output, d_output, stream, 1)

Saeed Khanehgir · Answer 4 · Sat May 14 2022 18:42:14 GMT+0800 (China Standard Time)

Thanks @SthPhoenix
Solved.