A load test of the exported model

Question

A load test of the exported model

michalbrys opened this issue 3 years ago · comments

It will be helpful to have a component (dedicated of extension to the InfraValidator TFX Pipeline Component) that performs a load test of the exported model.

The expected behavior is to load the exported model, create a TensorFlow Serving endpoint, send requests to the prediction endpoint, and measure the response time. The load test may be performed using the common open-source software like Locust / Vegeta for HTTP or ghz for gRPC protocol.

The motivation behind it is that the prediction time may vary on the model type and structure. With this component, we can initially check if the model will meet the prediction time's business requirements.

Robert Crowe · Answer 1 · Thu Apr 29 2021 02:34:42 GMT+0800 (China Standard Time)

Larry and Hannes both confirmed that they have a requirement for low-latency serving that would be benefitted by this. Questions about pushing and how to structure the pipeline. No blessing on test Serving instance, but would be a required blessing for production Push. Difficulty with exactly reproducing the production environment. InfraValidator does something similar, might be using TF.Serving. Similar to Evaluator, we could perhaps compare two models to assess the performance difference with the current test infra.

Robert Crowe · Answer 2 · Wed May 26 2021 10:55:27 GMT+0800 (China Standard Time)

See project proposal

Sayak Paul · Answer 3 · Wed Nov 24 2021 12:35:40 GMT+0800 (China Standard Time)

Is this currently being worked on?