NLP in NVIDIA Jetson Platform
Benchmark in tokens/sec
Pytorch-GPU | Pytorch-CPU(4 cores) | onnx-CPU (4 core) | onnx-CUDA | onnx-TRT | |
---|---|---|---|---|---|
Distill BERT SQuAD | 462 | 61 | 107 | 605 |
Name | Version |
---|---|
Jetpack | 4.4 |
OnnxRuntime | 1.3 |
https://gist.github.com/arijitx/c20379394852242a2fa03f76b9ee4e4f
Find a better version here : https://benjcunningham.org/installing-transformers-on-jetson-nano.html
git clone https://github.com/google/sentencepiece
cd /path/to/sentencepiece
mkdir build
cd build
cmake ..
make -j $(nproc)
sudo make install
sudo ldconfig -v
cd ..
cd python
python3 setup.py install
curl https://sh.rustup.rs -sSf | sh
rustc --version
exit
restart
pip3 install tokenizers
Install transformers
pip3 install transformers
sudo apt-get install libssl-dev
Downloading cmake3.14 from ‘https://cmake.org/download/ 32’
tar -zxvf cmake-3.14.0.tar.gz
cd cmake-3.14.0
sudo ./bootstrap //20 mimutes
sudo make
sudo make install
cmake --version //return the version of cmake
Jetson TX1/TX2/Nano (ARM64 Builds)
https://github.com/microsoft/onnxruntime/blob/master/BUILD.md#TensorRT
ONNX Runtime v1.2.0 or higher requires TensorRT 7 support, at this moment, the compatible TensorRT and CUDA libraries in JetPack 4.4 is still under developer preview stage. Therefore, we suggest using ONNX Runtime v1.1.2 with JetPack 4.3 which has been validated.
git clone --recursive https://github.com/Microsoft/onnxruntime
Indicate CUDA compiler. It's optional, cmake can automatically find the correct cuda.
export CUDACXX="/usr/local/cuda/bin/nvcc"
Modify tools/ci_build/build.py
- "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "ON"),
+ "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "OFF"),
Modify cmake/CMakeLists.txt
- set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_50,code=sm_50") # M series
+ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_53,code=sm_53") # Jetson TX1/Nano
+ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_62,code=sm_62") # Jetson TX2
Build onnxruntime with --use_tensorrt flag
./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu
Install the whl
/onnxruntime/build/Linux/Release/dist$ pip3 install onnxruntime_gpu_tensorrt-1.3.1-cp36-cp36m-linux_aarch64.whl
See instructions for additional information and tips.
https://github.com/huggingface/transformers/blob/master/notebooks/04-onnx-export.ipynb
Update code in src/transformers/convert_graph_to_onnx.py
export(
nlp.model,
model_args,
f=output,
input_names=ordered_input_names,
output_names=output_names,
dynamic_axes=dynamic_axes,
do_constant_folding=True,
opset_version=opset,
)
Dump Onnx Model
python3 convert_graph_to_onnx.py onnx/dbert_squad.onnx --pipeline question-answering --model distilbert-base-uncased-distilled-squad --tokenizer distilbert-base-uncased-distilled-squad --framework pt
Run benchmark.py
https://gist.github.com/arijitx/1400d3d4e07fc517d6c5bfea506c2353
Convert your onnx model to do shape infered tensorRT model
https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/TensorRT-ExecutionProvider.md#shape-inference-for-tensorrt-subgraphs
python3 /home/arijitx/onnxruntime/onnxruntime/core/providers/nuphar/scripts/symbolic_shape_infer.py --input dbert_squad.onnx --output dbert_squad_trt.onnx