tensorflow / serving

A flexible, high-performance serving system for machine learning models

Home Page:https://www.tensorflow.org/serving

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add arguments to configure gRPC completion queue parameters

yarikmarkov opened this issue · comments

I discovered a performance issue that Tensorflow Serving has an unexplainable and significant network delay for tail latencies when facing higher loads of traffic.

My setup was a client and a Tensorflow server located on the same host, so in theory the client latency should be roughly equal to the server one. The server was running a simple CPU model. In my experiment, the latency between client and server started to diverge at 50 qps for p99.99 tail latency. At 200 QPS the divergence was hitting 120-150ms, even though the server latency was around 30ms.

Upon debugging I discovered that Tensorflow Serving is not initializing the gRPC completion queue parameters. By default it initialized to 1 queue with 1 poller min and 2 pollers max. It seems to be a major bottleneck for applications caring about tail latency.

The parameters in question are: grpc::ServerBuilder::SyncServerOption::NUM_CQS, grpc::ServerBuilder::SyncServerOption::MIN_POLLERS and grpc::ServerBuilder::SyncServerOption::MAX_POLLERS

Once adding the code for configuration of the said parameters, and setting it to larger numbers than defaults, the divergence between server and client latency went down to almost 0.

Please add the arguments and code in Tensorflow Serving for configuring those.

Seems that people were investigating this issue in the past: https://discuss.tensorflow.org/t/tensorflow-serving-grpc-mode/11613

@yarikmarkov,

Just to confirm, Do you want TF Serving to have arguments to update these parameters in grpc code base?

@singhniraj08 exactly, I wanted to have the command line arguments in TF serving, which will eventually update the value of these parameters when initializing grpc server