NVIDIA / tensorrt-laboratory

Explore the Capabilities of the TensorRT Platform

Home Page:https://developer.nvidia.com/tensorrt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Model chaining example

SlipknotTN opened this issue · comments

In README file is mentioned the possibility to do a model chaining call "Model Chaining: Model A -> Glue -> Model B -> ect
", but I didn't find an example in the repository.

Is there an available example? Some hints on how to do that?

I'd like to group more models in the same client call to save transfer time.

I’ll whip up an example.

Help me understand your usecase more and I’ll see if I can get an example that help you get to where you want to go.

Thank you, my use case is like this:

Client send image -> Model A TensorRT on Server -> Model B TensorRT on Server -> Custom code C++ on server -> results to the client.

Intermediate results are big in size, so I'd like to keep the processing on server end-to-end.

The outputs of Model A are the inputs for Model B?

How about this for an example:

  • Decompose ResNet-152 into two TensorRT engines
    • Model A = base model which consists of the first 100-ish layers
    • Model B = customization model which consists of the remaining layers
    • Presumably you could have many customized models that all leverage the same base model.
    • The inference request will specify: base_model, customized_model
    • We will use the buffer reuse options of the CyclicAllocator and the ExecutionContext to minimize the memory footprint for the transaction
  • Provide the custom C++ post-processing lambda
    • Assume that the post-processing is large, so we'll provide some dedicated threads for "extra post-processing" outside the typical lifecycle.
    • We'll add a random 1-2ms of "post-processing"

Sorry for the question, but I saw that TensorRT Inference Server permits to chain more than one model (actually the feature is in development) and it is possible to add custom C++ code as custom backend model. Which the relation between this project and TensorRT Inference Server? Is this a lower level version of TRTIS?

Good question. NvRPC in TRTIS originated from this project. I hope someday the team pulls in the tensorrt runtime.