tensorflow / serving

A flexible, high-performance serving system for machine learning models

Home Page:https://www.tensorflow.org/serving

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance Degredation from 2.4.1 to 2.11.0

salliewalecka opened this issue · comments

Bug Report

If this is a bug report, please fill out the following form in full:

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): cloud.google.com/gke-os-distribution: cos kubernetes.io/os: linux
  • TensorFlow Serving installed from (source or binary): binary
  • TensorFlow Serving version: upgrading from 2.4.1 to 2.11.0

Describe the problem

I have a trained artifact of a simple neural net with semi-complex feature transformations made with tensorflow transforms. I've been running it for quite some time in 2.4.1 and recently looked into running the model with 2.11.0. With 2 different models, I've seen the image bump take the model from ~20ms p90 to >200ms as measured by the historgrams of :tensorflow:core:graph_run_time_usecs_histogram_bucket. One was trained with an earlier version and the 2nd trained with TF2.9.

Server side batching is not enabled and cpu utilization is at 6% with no cpu throttling (container_cpu_cfs_throttled_periods_total). All configurations for running are defaults.

In the startup logs, I see external/org_tensorflow/tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input in 2.4.1 and see no grappler log lines in 2.11.0 startup.

Ideally, we would not see a drastic performance degradation with a minor version upgrade. I'm looking to see if there was any changes in graph execution-related code in the recent releases or what remediations are possible to achieve performance parity.

Exact Steps to Reproduce

Cannot provide the model, but this has happened on 2 models.

Profiling info:

In general it seems like in the newer version there is more time in feature preprocessing wheras previously the dense layers dominated the profile. This profile was taken from the running a TFServing instance in our cluster under typical load.

2.11.0 top ops
TensorListFromTensor
StringSplitV2
PadV2
StatefulPartitionedCall
RaggedTensorToTensor

2.4.1 top ops
_FusedMatMul
StaticRegexReplace
Mul
Less
StringSplitV2

@salliewalecka,

Can you please provide what kind of model you are using (Ex. layers etc) so we can try to replicate the issue with a model of similar architecture. For more info, please go through Performance Guide TF serving. Thank you!

Thanks. I definitely am familiar with the performance guide. I'm asking the modeller I'm working with for their high level architecture and will get back to you shortly.

The model does use the keras Layers API. it has several feature InputLayers and DenseFeatures, some TensorFlow Hub layers, and 4 fully-connected Dense layers with relu activation and Batch Normalization.

@gharibian, Could you please look into this performance degradation issue after TF serving release upgrade from 2.4.1 to 2.11.0. Thank you!

Hey! Any other information I can provide to help triage?

Could you provide a self contained reproduction? Otherwise, it is very difficult to debug your particular issue.

Yes, getting that over to you now! Thanks for your help.

@salliewalecka, Could you please provide a self contained reproduction to debug the performance issue. Thank you!

Thanks! Just shared access with you for the sample model.

@salliewalecka, Thank you for sharing the sample model. The model will be shared with team internally.
We will update this thread with our findings. Thanks!

commented

Please use this grpc_client tool introduced in 25c5125 to load test.

A sample run looks like this (against a locally running modelserver in docker):

$ bazel-bin/tensorflow_serving/test_util/grpc_client \
  --server_port=localhost:8500 \
  --request=/tmp/request.bin \
  --model_name=test \
  --num_requests=10000 \
  --qps=500
Sending 10000 requests to localhost:8500 at 500 requests/sec.
Waiting for 10000 requests to complete...
Request stats (successful)
Count: 10000  Average: 19005.6747  StdDev: 2127.42
Min: 12492.0000  Median: 19024.3553  Max: 35999.0000
------------------------------------------------------
[    1.1e+04,    1.3e+04 )       2   0.020%   0.020% 
[    1.3e+04,    1.4e+04 )      22   0.220%   0.240% 
[    1.4e+04,    1.5e+04 )     241   2.410%   2.650% 
[    1.5e+04,    1.7e+04 )    1380  13.800%  16.450% ###
[    1.7e+04,    1.9e+04 )    2402  24.020%  40.470% #####
[    1.9e+04,      2e+04 )    3406  34.060%  74.530% #######
[      2e+04,    2.2e+04 )    2122  21.220%  95.750% ####
[    2.2e+04,    2.5e+04 )     326   3.260%  99.010% #
[    2.5e+04,    2.7e+04 )      66   0.660%  99.670% 
[    2.7e+04,      3e+04 )      22   0.220%  99.890% 
[      3e+04,    3.3e+04 )       6   0.060%  99.950% 
[    3.3e+04,    3.6e+04 )       5   0.050% 100.000%
$

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

This issue was closed due to lack of activity after being marked stale for past 7 days.

Are you satisfied with the resolution of your issue?
Yes
No