octoml/triton-modelanalyzer-experiments

This repo contains artifacts and configurations to use the Triton Model Analyzer.

Instructions

Clone the model analyzer repo.

    git clone https://github.com/triton-inference-server/model_analyzer.git -b r22.10

Build a container via instructions here.

    cd ./model_analyzer
    docker build --pull -t model-analyzer .

Clone this repo.

    git clone TBD
    cd model_analyzer_experiments

Launch the model analyzer container with these paths. Add --gpu=all when available.

    docker run -it --rm \
        -v $(pwd)/config:/config \
        -v $(pwd)/models:/models \
        -v $(pwd)/export:/export \
        -v $(pwd)/output:/output \
        --net=host model-analyzer

Run experiments via the following config. Modify as necessary.

    model-analyzer --verbose profile --config-file /config/modelanalyzer.yaml

Notes

To Supply Custom InputData to PerfAnalyzer

To configure ONNX backend parameters - There are two options

Set the following in the model's config.pbtxt

parameters { key: "intra_op_thread_count" value: { string_value: "16" } }
parameters { key: "inter_op_thread_count" value: { string_value: "1" } }

Set the following as flags in the model analyzer yaml

  yolov5s:
    triton_server_flags:
      log_verbose: True
      backend-config:
        onnxruntime,inter_op_thread_count=1
        onnxruntime,intra_op_thread_count=1
        onnxruntime,enable-global-threadpool=1

About

Apache License 2.0