NVIDIA / Deep-Learning-Accelerator-SW

NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need more DLA samples

ram-cherukuri opened this issue · comments

Please use this as a forum to tell us what types of samples would be most useful to you for leveraging DLA effectively in your application development. We will try our best to address the requests.

Tasks

No tasks being tracked yet.

This is great!
I am learning to program dla with cudla in standalone mode by cuda samples cuDLAStandaloneMode.
The loadable bin is created by tensorrt with cmd: trtexec --deploy=/usr/src/tensorrt/data/resnet50/ResNet50_N2.prototxt --model=/usr/src/tensorrt/data/resnet50/ResNet50_fp32.caffemodel --output=fc1000 --useDLACore=0 --int8 --memPoolSize=dlaSRAM:1 --inputIOFormats=int8:chw --outputIOFormats=int8:chw --saveEngine=./resnet_50_int8_chw.bin --buildOnly --safe
The dtype of inputs and outputs of loadable bin are int8 and the original mode's are fp32. In the sample mentioned above, there is no code about how to pre-process the fp32 input to int8 and post-process the int8 output to float32.
So, can you post a sample to demonstrate how to process fp32 input fed to dla int8 model?