Parses ONNX models for execution with TensorRT.
See also the TensorRT documentation.
Development on the Master branch is for the latest version of TensorRT (5.1)
For versions < 5.1, clone and build from the 5.0 branch
Current supported ONNX operators are found in the operator support matrix.
Clone the code from GitHub.
git clone --recursive https://github.com/onnx/onnx-tensorrt.git
The TensorRT-ONNX executables and libraries are built with CMAKE. Note by default CMAKE will tell the CUDA compiler generate code for the latest SM version. If you are using a GPU with a lower SM version you can specify which SMs to build for by using the optional -DGPU_ARCHS
flag. For example, if you have a GTX 1080, you can specify -DGPU_ARCHS="61"
to generate CUDA code specifically for that card.
See here for finding what maximum compute capability your specific GPU supports.
mkdir build
cd build
cmake .. -DTENSORRT_ROOT=<tensorrt_install_dir>
OR
cmake .. -DTENSORRT_ROOT=<tensorrt_install_dir> -DGPU_ARCHS="61"
make -j8
sudo make install
ONNX models can be converted to serialized TensorRT engines using the onnx2trt
executable:
onnx2trt my_model.onnx -o my_engine.trt
ONNX models can also be converted to human-readable text:
onnx2trt my_model.onnx -t my_model.onnx.txt
See more usage information by running:
onnx2trt -h
The TensorRT backend for ONNX can be used in Python as follows:
import onnx
import onnx_tensorrt.backend as backend
import numpy as np
model = onnx.load("/path/to/model.onnx")
engine = backend.prepare(model, device='CUDA:1')
input_data = np.random.random(size=(32, 3, 224, 224)).astype(np.float32)
output_data = engine.run(input_data)[0]
print(output_data)
print(output_data.shape)
The model parser library, libnvonnxparser.so, has its C++ API declared in this header:
NvOnnxParser.h
TensorRT engines built using this parser must use the plugin factory provided in libnvonnxparser_runtime.so, which has its C++ API declared in this header:
NvOnnxParserRuntime.h
Python bindings for the ONNX-TensorRT parser in TensorRT versions >= 5.0 are packaged in the shipped .whl
files. Install them with
pip install <tensorrt_install_dir>/python/tensorrt-5.1.6.0-cp27-none-linux_x86_64.whl
For earlier versions of TensorRT, the Python wrappers are built using SWIG. Build the Python wrappers and modules by running:
python setup.py build
sudo python setup.py install
Build the onnx_tensorrt Docker image by running:
cp /path/to/TensorRT-5.1.*.tar.gz .
docker build -t onnx_tensorrt .
After installation (or inside the Docker container), ONNX backend tests can be run as follows:
Real model tests only:
python onnx_backend_test.py OnnxBackendRealModelTest
All tests:
python onnx_backend_test.py
You can use -v
flag to make output more verbose.
Pre-trained models in ONNX format can be found at the ONNX Model Zoo