Performance Degredation from 2.4.1 to 2.11.0

Question

Performance Degredation from 2.4.1 to 2.11.0

salliewalecka opened this issue 2 years ago · comments

Sallie Walecka commented 2 years ago

Bug Report

If this is a bug report, please fill out the following form in full:

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): cloud.google.com/gke-os-distribution: cos kubernetes.io/os: linux
TensorFlow Serving installed from (source or binary): binary
TensorFlow Serving version: upgrading from 2.4.1 to 2.11.0

Describe the problem

I have a trained artifact of a simple neural net with semi-complex feature transformations made with tensorflow transforms. I've been running it for quite some time in 2.4.1 and recently looked into running the model with 2.11.0. With 2 different models, I've seen the image bump take the model from ~20ms p90 to >200ms as measured by the historgrams of :tensorflow:core:graph_run_time_usecs_histogram_bucket. One was trained with an earlier version and the 2nd trained with TF2.9.

Server side batching is not enabled and cpu utilization is at 6% with no cpu throttling (container_cpu_cfs_throttled_periods_total). All configurations for running are defaults.

In the startup logs, I see external/org_tensorflow/tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input in 2.4.1 and see no grappler log lines in 2.11.0 startup.

Ideally, we would not see a drastic performance degradation with a minor version upgrade. I'm looking to see if there was any changes in graph execution-related code in the recent releases or what remediations are possible to achieve performance parity.

Exact Steps to Reproduce

Cannot provide the model, but this has happened on 2 models.

Profiling info:

In general it seems like in the newer version there is more time in feature preprocessing wheras previously the dense layers dominated the profile. This profile was taken from the running a TFServing instance in our cluster under typical load.

2.11.0 top ops
TensorListFromTensor
StringSplitV2
PadV2
StatefulPartitionedCall
RaggedTensorToTensor

2.4.1 top ops
_FusedMatMul
StaticRegexReplace
Mul
Less
StringSplitV2

Niraj Singh · Answer 1 · Mon Dec 19 2022 13:08:40 GMT+0800 (China Standard Time)

@salliewalecka,

Can you please provide what kind of model you are using (Ex. layers etc) so we can try to replicate the issue with a model of similar architecture. For more info, please go through Performance Guide TF serving. Thank you!

Sallie Walecka · Answer 2 · Thu Dec 22 2022 08:08:36 GMT+0800 (China Standard Time)

Thanks. I definitely am familiar with the performance guide. I'm asking the modeller I'm working with for their high level architecture and will get back to you shortly.

Sallie Walecka · Answer 3 · Fri Dec 23 2022 06:49:28 GMT+0800 (China Standard Time)

The model does use the keras Layers API. it has several feature InputLayers and DenseFeatures, some TensorFlow Hub layers, and 4 fully-connected Dense layers with relu activation and Batch Normalization.

Niraj Singh · Answer 4 · Fri Dec 23 2022 13:17:06 GMT+0800 (China Standard Time)

@gharibian, Could you please look into this performance degradation issue after TF serving release upgrade from 2.4.1 to 2.11.0. Thank you!

Sallie Walecka · Answer 5 · Sat Jan 07 2023 07:24:57 GMT+0800 (China Standard Time)

Hey! Any other information I can provide to help triage?

Dero Gharibian · Answer 6 · Fri Jan 27 2023 02:19:47 GMT+0800 (China Standard Time)

Could you provide a self contained reproduction? Otherwise, it is very difficult to debug your particular issue.

Sallie Walecka · Answer 7 · Sat Jan 28 2023 02:25:21 GMT+0800 (China Standard Time)

Yes, getting that over to you now! Thanks for your help.

Niraj Singh · Answer 8 · Fri Feb 17 2023 16:00:21 GMT+0800 (China Standard Time)

@salliewalecka, Could you please provide a self contained reproduction to debug the performance issue. Thank you!

Sallie Walecka · Answer 9 · Sat Feb 18 2023 01:14:09 GMT+0800 (China Standard Time)

Thanks! Just shared access with you for the sample model.

Niraj Singh · Answer 10 · Mon Feb 20 2023 21:22:21 GMT+0800 (China Standard Time)

@salliewalecka, Thank you for sharing the sample model. The model will be shared with team internally.
We will update this thread with our findings. Thanks!

netfs · Answer 11 · Sat Apr 08 2023 05:25:09 GMT+0800 (China Standard Time)

Please use this grpc_client tool introduced in 25c5125 to load test.

A sample run looks like this (against a locally running modelserver in docker):

$ bazel-bin/tensorflow_serving/test_util/grpc_client \
  --server_port=localhost:8500 \
  --request=/tmp/request.bin \
  --model_name=test \
  --num_requests=10000 \
  --qps=500
Sending 10000 requests to localhost:8500 at 500 requests/sec.
Waiting for 10000 requests to complete...
Request stats (successful)
Count: 10000  Average: 19005.6747  StdDev: 2127.42
Min: 12492.0000  Median: 19024.3553  Max: 35999.0000
------------------------------------------------------
[    1.1e+04,    1.3e+04 )       2   0.020%   0.020% 
[    1.3e+04,    1.4e+04 )      22   0.220%   0.240% 
[    1.4e+04,    1.5e+04 )     241   2.410%   2.650% 
[    1.5e+04,    1.7e+04 )    1380  13.800%  16.450% ###
[    1.7e+04,    1.9e+04 )    2402  24.020%  40.470% #####
[    1.9e+04,      2e+04 )    3406  34.060%  74.530% #######
[      2e+04,    2.2e+04 )    2122  21.220%  95.750% ####
[    2.2e+04,    2.5e+04 )     326   3.260%  99.010% #
[    2.5e+04,    2.7e+04 )      66   0.660%  99.670% 
[    2.7e+04,      3e+04 )      22   0.220%  99.890% 
[      3e+04,    3.3e+04 )       6   0.060%  99.950% 
[    3.3e+04,    3.6e+04 )       5   0.050% 100.000%
$

github-actions · Answer 12 · Tue Aug 29 2023 09:48:06 GMT+0800 (China Standard Time)

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions · Answer 13 · Wed Sep 06 2023 09:47:47 GMT+0800 (China Standard Time)

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler · Answer 14 · Wed Sep 06 2023 09:47:55 GMT+0800 (China Standard Time)

Are you satisfied with the resolution of your issue?
Yes
No