NVIDIA / retinanet-examples

Fast and accurate object detection with end-to-end GPU optimization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

odtk process get killed when trying to export model on xavier nx

tahasadr opened this issue · comments

Hi there
following my issue, I tried to export my model on xavier nx
but faced several problem

  • No Precompiled Nvidia DALI
    Solved via crosscompiling as described in this
  • no tk module,
$ apt install python3-tk
  • Installed torch doesn`t support nccl
    Solved via building pytorch from source as stated here with these modification
export USE_NCCL=1
export USE_DISTRIBUTED=1
export USE_CUDA=1
export USE_CUDNN=1
export USE_NUMPY=1
export USE_MKLDNN=1
export USE_NNPACK=1
export USE_QNNPACK=1
export USE_OPENCV=1
export USE_PYTORCH_QNNPACK=1
export TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2"
export PYTORCH_BUILD_VERSION=1.9.0
export PYTORCH_BUILD_NUMBER=1

My config:

  • Jetpack 4.5.1
  • Docker Image: nvcr.io/nvidia/l4t-ml:r32.5.0-py3
  • Pytorch 1.9 built from source
  • Dali 1.7
  • Cuda 10.2

now when i try to export my model process get killed.

$ odtk export fine_tune_from_rn50fpn.pth engine.plan
NOTE! Installing ujson may make loading annotations faster.
Loading model from fine_tune_from_rn50fpn.pth...
     model: RetinaNet
  backbone: ResNet50FPN
   classes: 6, anchors: 9
Exporting to ONNX...
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ../c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/usr/local/lib/python3.6/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
Building FP16 core model...
Building accelerated plugins...
Applying optimizations and building TRT CUDA engine...
Killed

any idea?
@yashnv