ggluo / TensorRT-Cpp-Example

C++/C TensorRT Inference Example for models created with Pytorch/JAX/TF

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

C++/C TensorRT Inference Example

This repository provides C++ and C examples that use TensorRT to inference the models that are implement with Pytorch/JAX/Tensorflow.

This is integrated into BART


  • Python 3.x
  • TensorRT
  • CUDA Toolkit
  • PyTorch
  • ONNX


  1. Clone the repository:

    git clone
    cd TensorRT-Tiny-Cpp-Example
  2. Install onnx and torch if not:

    pip install torch onnx onnxscript
  3. Ensure that TensorRT and CUDA Toolkit are installed on your system and specify it according in the makefile.

    LDFLAGS = -L/path/to/TensorRT/lib
    INCLUDEDIRS = -I/path/to/TensorRT/include

Running the Test

To run the test script, execute the following commands:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/TensorRT/lib

This script performs the following steps:

  1. Exports the ONNX model: python data/ data/model.onnx
  2. Compiles the TensorRT inference code: make
  3. Runs the TensorRT inference code: ./main data/model.onnx data/first_engine.trt

The provided ONNX model is located at data/model.onnx, and the resulting TensorRT engine will be saved to data/first_engine.trt.

Overview of main.cpp

The main.cpp file contains the main entry point for the TensorRT inference code. Below is an overview of its functionality:

#include "trt.h"
#include <iostream>

int main(int argc, char** argv) {
    std::cout << "Hello World from TensorRT" << std::endl;

    // Parse command-line arguments
    infer_params params{argv[1], 1,  argv[2], ""}; 

    // Initialize TensorRT inference object
    trt_infer trt(params);;

    // Copy input data from host to device
    trt.CopyFromHostToDevice({0.5f, -0.5f, 1.0f}, 0, nullptr);

    // Perform inference

    // Copy output data from device to host
    std::vector<float> output(2, 0.0f);
    trt.CopyFromDeviceToHost(output, 1, nullptr);

    // Print output
    std::cout << "Output: " << output[0] << ", " << output[1] << std::endl;

    return 0;

This code performs the following steps:

  1. Initializes the TensorRT inference parameters using command-line arguments.
  2. Initializes the TensorRT inference object and builds the inference engine.
  3. Copies input data from the host to the device.
  4. Performs inference.
  5. Copies output data from the device to the host.
  6. Prints the output.


  1. memory leakage check with valgrind
  2. add c_connector


C++/C TensorRT Inference Example for models created with Pytorch/JAX/TF


Language:C++ 56.1%Language:Python 36.0%Language:Makefile 3.8%Language:C 2.5%Language:Shell 1.6%