DeepStream-Yolo
NVIDIA DeepStream SDK 6.0 configuration for YOLO models
Future updates (comming soon, stay tuned)
- New documentation for multiple models
- DeepStream tutorials
- Native PP-YOLO support
- GPU NMS
- Dynamic batch-size
Improvements on this repository
- Darknet CFG params parser (no need to edit nvdsparsebbox_Yolo.cpp or another file)
- Support for new_coords, beta_nms and scale_x_y params
- Support for new models
- Support for new layers
- Support for new activations
- Support for convolutional groups
- Support for INT8 calibration
- Support for non square models
- Support for reorg, implicit and channel layers (YOLOR)
- YOLOv5 6.0 native support
- YOLOR native support
- Models benchmarks
Getting started
- Requirements
- Tested models
- Benchmarks
- dGPU installation
- Basic usage
- YOLOv5 usage
- YOLOR usage
- INT8 calibration
- Using your custom model
Requirements
- Ubuntu 18.04
- CUDA 11.4.3
- TensorRT 8.0 GA (8.0.1)
- cuDNN >= 8.2
- NVIDIA Driver >= 470.63.01
- NVIDIA DeepStream SDK 6.0
- DeepStream-Yolo
For YOLOv5 and YOLOR:
Tested models
Benchmarks
nms = 0.45 (changed to beta_nms when used in Darknet cfg file) / 0.6 (YOLOv5 and YOLOR models)
pre-cluster-threshold = 0.001 (mAP eval) / 0.25 (FPS measurement)
batch-size = 1
valid = val2017 (COCO) - 1000 random images for INT8 calibration
sample = 1920x1080 video
NOTE: Used maintain-aspect-ratio=1 in config_infer file for YOLOv4 (with letter_box=1), YOLOv5 and YOLOR models.
NVIDIA GTX 1050 4GB (Mobile)
YOLOR-CSP performance comparison
DeepStream | PyTorch | |
---|---|---|
FPS (without display) | 13.32 | 10.07 |
FPS (with display) | 12.63 | 9.41 |
YOLOv5n performance comparison
DeepStream | TensorRTx | Ultralytics | |
---|---|---|---|
FPS (without display) | 110.25 | 87.42 | 97.19 |
FPS (with display) | 105.62 | 73.07 | 50.37 |
More
DeepStream | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS (without display) |
---|---|---|---|---|---|---|
YOLOR-P6 | FP32 | 1280 | 0.478 | 0.663 | 0.519 | 5.53 |
YOLOR-CSP-X* | FP32 | 640 | 0.473 | 0.664 | 0.513 | 7.59 |
YOLOR-CSP-X | FP32 | 640 | 0.470 | 0.661 | 0.507 | 7.52 |
YOLOR-CSP* | FP32 | 640 | 0.459 | 0.652 | 0.496 | 13.28 |
YOLOR-CSP | FP32 | 640 | 0.449 | 0.639 | 0.483 | 13.32 |
YOLOv5x6 6.0 | FP32 | 1280 | 0.504 | 0.681 | 0.547 | 2.22 |
YOLOv5l6 6.0 | FP32 | 1280 | 0.492 | 0.670 | 0.535 | 4.05 |
YOLOv5m6 6.0 | FP32 | 1280 | 0.463 | 0.642 | 0.504 | 7.54 |
YOLOv5s6 6.0 | FP32 | 1280 | 0.394 | 0.572 | 0.424 | 18.64 |
YOLOv5n6 6.0 | FP32 | 1280 | 0.294 | 0.452 | 0.314 | 26.94 |
YOLOv5x 6.0 | FP32 | 640 | 0.469 | 0.654 | 0.509 | 8.24 |
YOLOv5l 6.0 | FP32 | 640 | 0.450 | 0.634 | 0.487 | 14.96 |
YOLOv5m 6.0 | FP32 | 640 | 0.415 | 0.601 | 0.448 | 28.30 |
YOLOv5s 6.0 | FP32 | 640 | 0.334 | 0.516 | 0.355 | 63.55 |
YOLOv5n 6.0 | FP32 | 640 | 0.250 | 0.417 | 0.260 | 110.25 |
YOLOv4-P6 | FP32 | 1280 | 0.499 | 0.685 | 0.542 | 2.57 |
YOLOv4-P5 | FP32 | 896 | 0.472 | 0.659 | 0.513 | 5.48 |
YOLOv4-CSP-X-SWISH | FP32 | 640 | 0.473 | 0.664 | 0.513 | 7.51 |
YOLOv4-CSP-SWISH | FP32 | 640 | 0.459 | 0.652 | 0.496 | 13.13 |
YOLOv4x-MISH | FP32 | 640 | 0.459 | 0.650 | 0.495 | 7.53 |
YOLOv4-CSP | FP32 | 640 | 0.440 | 0.632 | 0.474 | 13.19 |
YOLOv4 | FP32 | 608 | 0.498 | 0.740 | 0.549 | 12.18 |
YOLOv4-Tiny | FP32 | 416 | 0.215 | 0.403 | 0.206 | 201.20 |
YOLOv3-SPP | FP32 | 608 | 0.411 | 0.686 | 0.433 | 12.22 |
YOLOv3-Tiny-PRN | FP32 | 416 | 0.167 | 0.382 | 0.125 | 277.14 |
YOLOv3 | FP32 | 608 | 0.377 | 0.672 | 0.385 | 12.51 |
YOLOv3-Tiny | FP32 | 416 | 0.095 | 0.203 | 0.079 | 218.42 |
YOLOv2 | FP32 | 608 | 0.286 | 0.541 | 0.273 | 25.28 |
YOLOv2-Tiny | FP32 | 416 | 0.102 | 0.258 | 0.061 | 231.36 |
dGPU installation
To install the DeepStream on dGPU (x86 platform), without docker, we need to do some steps to prepare the computer.
Open
1. Disable Secure Boot in BIOS
If you are using a laptop with newer Intel/AMD processors, please update the kernel to newer version.
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-headers-5.11.0-051100_5.11.0-051100.202102142330_all.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-headers-5.11.0-051100-generic_5.11.0-051100.202102142330_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-image-unsigned-5.11.0-051100-generic_5.11.0-051100.202102142330_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-modules-5.11.0-051100-generic_5.11.0-051100.202102142330_amd64.deb
sudo dpkg -i *.deb
sudo reboot
2. Install dependencies
sudo apt-get install gcc make git libtool autoconf autogen pkg-config cmake
sudo apt-get install python3 python3-dev python3-pip
sudo apt install libssl1.0.0 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstrtspserver-1.0-0 libjansson4
sudo apt-get install linux-headers-$(uname -r)
NOTE: Install DKMS if you are using the default Ubuntu kernel
sudo apt-get install dkms
NOTE: Purge all NVIDIA driver, CUDA, etc.
3. Disable Nouveau
sudo nano /etc/modprobe.d/blacklist-nouveau.conf
- Add
blacklist nouveau
options nouveau modeset=0
- Run
sudo update-initramfs -u
4. Reboot the computer
sudo reboot
5. Download and install NVIDIA Driver without xconfig
wget https://us.download.nvidia.com/tesla/470.82.01/NVIDIA-Linux-x86_64-470.82.01.run
sudo sh NVIDIA-Linux-x86_64-470.82.01.run
NOTE: If you are using default Ubuntu kernel, enable the DKMS during the installation. Else, you can skip this driver installation and install the NVIDIA driver from CUDA runfile (next step).
6. Download and install CUDA 11.4.3 without NVIDIA Driver
wget https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run
sudo sh cuda_11.4.3_470.82.01_linux.run
- Export environment variables
nano ~/.bashrc
- Add
export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
- Run
source ~/.bashrc
sudo ldconfig
NOTE: If you are using a laptop with NVIDIA Optimius, run
sudo apt-get install nvidia-prime
sudo prime-select nvidia
NVIDIA website and install the TensorRT 8.0 GA (8.0.1)
7. Download fromecho "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda-repo.list
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-key add 7fa2af80.pub
sudo apt-get update
sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda11.3-trt8.0.1.6-ga-20210626_1-1_amd64.deb
sudo apt-key add /var/nv-tensorrt-repo-ubuntu1804-cuda11.3-trt8.0.1.6-ga-20210626/7fa2af80.pub
sudo apt-get update
sudo apt-get install libnvinfer8=8.0.1-1+cuda11.3 libnvinfer-plugin8=8.0.1-1+cuda11.3 libnvparsers8=8.0.1-1+cuda11.3 libnvonnxparsers8=8.0.1-1+cuda11.3 libnvinfer-bin=8.0.1-1+cuda11.3 libnvinfer-dev=8.0.1-1+cuda11.3 libnvinfer-plugin-dev=8.0.1-1+cuda11.3 libnvparsers-dev=8.0.1-1+cuda11.3 libnvonnxparsers-dev=8.0.1-1+cuda11.3 libnvinfer-samples=8.0.1-1+cuda11.3 libnvinfer-doc=8.0.1-1+cuda11.3
NVIDIA website and install the DeepStream SDK 6.0
8. Download fromsudo apt-get install ./deepstream-6.0_6.0.0-1_amd64.deb
rm ${HOME}/.cache/gstreamer-1.0/registry.x86_64.bin
9. Reboot the computer
sudo reboot
Basic usage
1. Download the repo
git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo
2. Download cfg and weights files from your model and move to DeepStream-Yolo folder
3. Compile lib
- x86 platform
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
- Jetson platform
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
4. Edit config_infer_primary.txt for your model (example for YOLOv4)
[property]
...
# 0=RGB, 1=BGR, 2=GRAYSCALE
model-color-format=0
# CFG
custom-network-config=yolov4.cfg
# Weights
model-file=yolov4.weights
# Generated TensorRT model (will be created if it doesn't exist)
model-engine-file=model_b1_gpu0_fp32.engine
# Model labels file
labelfile-path=labels.txt
# Batch size
batch-size=1
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
# Number of classes in label file
num-detected-classes=80
...
[class-attrs-all]
# CONF_THRESH
pre-cluster-threshold=0.25
5. Run
deepstream-app -c deepstream_app_config.txt
NOTE: If you want to use YOLOv2 or YOLOv2-Tiny models, change the deepstream_app_config.txt file before run it
...
[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2.txt
NOTE: The config_infer_primary.txt file uses cluster-mode=4 and NMS = 0.45 (via code) when beta_nms isn't available (when beta_nms is available, NMS = beta_nms), while the config_infer_primary_yoloV2.txt file uses cluster-mode=2 and nms-iou-threshold=0.45 to set NMS.
YOLOv5 usage
ultralytics/yolov5 folder
1. Copy gen_wts_yoloV5.py from DeepStream-Yolo/utils to2. Open the ultralytics/yolov5 folder
ultralytics/yolov5 website (example for YOLOv5n)
3. Download pt file fromwget https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5n.pt
4. Generate cfg and wts files (example for YOLOv5n)
python3 gen_wts_yoloV5.py -w yolov5n.pt
5. Copy generated cfg and wts files to DeepStream-Yolo folder
6. Open DeepStream-Yolo folder
7. Compile lib
- x86 platform
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
- Jetson platform
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
8. Edit config_infer_primary_yoloV5.txt for your model (example for YOLOv5n)
[property]
...
# 0=RGB, 1=BGR, 2=GRAYSCALE
model-color-format=0
# CFG
custom-network-config=yolov5n.cfg
# WTS
model-file=yolov5n.wts
# Generated TensorRT model (will be created if it doesn't exist)
model-engine-file=model_b1_gpu0_fp32.engine
# Model labels file
labelfile-path=labels.txt
# Batch size
batch-size=1
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
# Number of classes in label file
num-detected-classes=80
...
[class-attrs-all]
# CONF_THRESH
pre-cluster-threshold=0.25
8. Change the deepstream_app_config.txt file
...
[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV5.txt
9. Run
deepstream-app -c deepstream_app_config.txt
NOTE: For YOLOv5 P6 or custom models, check the gen_wts_yoloV5.py args and use them according to your model
- Input weights (.pt) file path (required)
-w or --weights
- Input cfg (.yaml) file path
-c or --yaml
- Model width (default = 640 / 1280 [P6])
-mw or --width
- Model height (default = 640 / 1280 [P6])
-mh or --height
- Model channels (default = 3)
-mc or --channels
- P6 model
--p6
YOLOR usage
yolor folder
1. Copy gen_wts_yolor.py from DeepStream-Yolo/utils to2. Open the yolor folder
yolor website
3. Download pt file from4. Generate wts file (example for YOLOR-CSP)
python3 gen_wts_yolor.py -w yolor_csp.pt -c cfg/yolor_csp.cfg
5. Copy cfg and generated wts files to DeepStream-Yolo folder
6. Open DeepStream-Yolo folder
7. Compile lib
- x86 platform
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
- Jetson platform
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
8. Edit config_infer_primary_yolor.txt for your model (example for YOLOR-CSP)
[property]
...
# 0=RGB, 1=BGR, 2=GRAYSCALE
model-color-format=0
# CFG
custom-network-config=yolor_csp.cfg
# WTS
model-file=yolor_csp.wts
# Generated TensorRT model (will be created if it doesn't exist)
model-engine-file=model_b1_gpu0_fp32.engine
# Model labels file
labelfile-path=labels.txt
# Batch size
batch-size=1
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
# Number of classes in label file
num-detected-classes=80
...
[class-attrs-all]
# CONF_THRESH
pre-cluster-threshold=0.25
8. Change the deepstream_app_config.txt file
...
[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yolor.txt
9. Run
deepstream-app -c deepstream_app_config.txt
INT8 calibration
1. Install OpenCV
sudo apt-get install libopencv-dev
2. Compile/recompile the nvdsinfer_custom_impl_Yolo lib with OpenCV support
- x86 platform
cd DeepStream-Yolo
CUDA_VER=11.4 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
- Jetson platform
cd DeepStream-Yolo
CUDA_VER=10.2 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
val2017, extract, and move to DeepStream-Yolo folder
3. For COCO dataset, download theSelect 1000 random images from COCO dataset to run calibration
mkdir calibration
for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \
cp ${jpg} calibration/; \
done
Create the calibration.txt file with all selected images
realpath calibration/*jpg > calibration.txt
Set environment variables
export INT8_CALIB_IMG_PATH=calibration.txt
export INT8_CALIB_BATCH_SIZE=1
Change config_infer_primary.txt file
...
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
...
network-mode=0
...
- To
...
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
...
network-mode=1
...
Run
deepstream-app -c deepstream_app_config.txt
NOTE: NVIDIA recommends at least 500 images to get a good accuracy. In this example I used 1000 images to get better accuracy (more images = more accuracy). Higher INT8_CALIB_BATCH_SIZE values will increase the accuracy and calibration speed. Set it according to you GPU memory. This process can take a long time.
Extract metadata
You can get metadata from deepstream in Python and C++. For C++, you need edit deepstream-app or deepstream-test code. For Python your need install and edit deepstream_python_apps.
You need manipulate NvDsObjectMeta (Python/C++), NvDsFrameMeta (Python/C++) and NvOSD_RectParams (Python/C++) to get label, position, etc. of bboxes.
In C++ deepstream-app application, your code need be in analytics_done_buf_prob function. In C++/Python deepstream-test application, your code need be in osd_sink_pad_buffer_probe/tiler_src_pad_buffer_probe function.
My projects: https://www.youtube.com/MarcosLucianoTV (new videos and tutorials comming soon)