tensorrt tensorrt-conversion tensorrt-inference

C++/C TensorRT Inference Example

This repository provides C++ and C examples that use TensorRT to inference the models that are implement with Pytorch/JAX/Tensorflow.

This is integrated into BART

Requirements

Python 3.x
TensorRT
CUDA Toolkit
PyTorch
ONNX

Setup

Clone the repository:

git clone https://github.com/ggluo/TensorRT-Tiny-Cpp-Example.git
cd TensorRT-Tiny-Cpp-Example

Install onnx and torch if not:
```
pip install torch onnx onnxscript
```
Ensure that TensorRT and CUDA Toolkit are installed on your system and specify it according in the makefile.
```
LDFLAGS = -L/path/to/TensorRT/lib
INCLUDEDIRS = -I/path/to/TensorRT/include
```

Running the Test

To run the test script, execute the following commands:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/TensorRT/lib
bash run_test.sh

This script performs the following steps:

Exports the ONNX model: python data/export_model.py data/model.onnx
Compiles the TensorRT inference code: make
Runs the TensorRT inference code: ./main data/model.onnx data/first_engine.trt

The provided ONNX model is located at data/model.onnx, and the resulting TensorRT engine will be saved to data/first_engine.trt.

Overview of `main.cpp`

The main.cpp file contains the main entry point for the TensorRT inference code. Below is an overview of its functionality:

#include "trt.h"
#include <iostream>

int main(int argc, char** argv) {
    std::cout << "Hello World from TensorRT" << std::endl;

    // Parse command-line arguments
    infer_params params{argv[1], 1,  argv[2], ""}; 

    // Initialize TensorRT inference object
    trt_infer trt(params);
    trt.build();

    // Copy input data from host to device
    trt.CopyFromHostToDevice({0.5f, -0.5f, 1.0f}, 0, nullptr);

    // Perform inference
    trt.infer();

    // Copy output data from device to host
    std::vector<float> output(2, 0.0f);
    trt.CopyFromDeviceToHost(output, 1, nullptr);

    // Print output
    std::cout << "Output: " << output[0] << ", " << output[1] << std::endl;

    return 0;
}

This code performs the following steps:

Initializes the TensorRT inference parameters using command-line arguments.
Initializes the TensorRT inference object and builds the inference engine.
Copies input data from the host to the device.
Performs inference.
Copies output data from the device to the host.
Prints the output.

TODO

memory leakage check with valgrind
add c_connector

About

C++/C TensorRT Inference Example for models created with Pytorch/JAX/TF