Model chaining example

Question

Model chaining example

SlipknotTN opened this issue 5 years ago · comments

In README file is mentioned the possibility to do a model chaining call "Model Chaining: Model A -> Glue -> Model B -> ect ", but I didn't find an example in the repository.

Is there an available example? Some hints on how to do that?

I'd like to group more models in the same client call to save transfer time.

Ryan Olson · Answer 1 · Fri Mar 15 2019 03:47:05 GMT+0800 (China Standard Time)

I’ll whip up an example.

Help me understand your usecase more and I’ll see if I can get an example that help you get to where you want to go.

Michele Toni · Answer 2 · Fri Mar 15 2019 17:02:01 GMT+0800 (China Standard Time)

Thank you, my use case is like this:

Client send image -> Model A TensorRT on Server -> Model B TensorRT on Server -> Custom code C++ on server -> results to the client.

Intermediate results are big in size, so I'd like to keep the processing on server end-to-end.

Ryan Olson · Answer 3 · Sat Mar 16 2019 19:06:45 GMT+0800 (China Standard Time)

The outputs of Model A are the inputs for Model B?

How about this for an example:

Decompose ResNet-152 into two TensorRT engines
- Model A = base model which consists of the first 100-ish layers
- Model B = customization model which consists of the remaining layers
- Presumably you could have many customized models that all leverage the same base model.
- The inference request will specify: base_model, customized_model
- We will use the buffer reuse options of the CyclicAllocator and the ExecutionContext to minimize the memory footprint for the transaction
Provide the custom C++ post-processing lambda
- Assume that the post-processing is large, so we'll provide some dedicated threads for "extra post-processing" outside the typical lifecycle.
- We'll add a random 1-2ms of "post-processing"

Michele Toni · Answer 4 · Fri Mar 29 2019 00:15:31 GMT+0800 (China Standard Time)

Sorry for the question, but I saw that TensorRT Inference Server permits to chain more than one model (actually the feature is in development) and it is possible to add custom C++ code as custom backend model. Which the relation between this project and TensorRT Inference Server? Is this a lower level version of TRTIS?

Ryan Olson · Answer 5 · Tue Apr 02 2019 09:59:26 GMT+0800 (China Standard Time)

Good question. NvRPC in TRTIS originated from this project. I hope someday the team pulls in the tensorrt runtime.