Restoring SavedModel bundle takes long time (30 minutes) to load TensorFlow model with docker container tensorflow/serving gpu

Question

Restoring SavedModel bundle takes long time (30 minutes) to load TensorFlow model with docker container tensorflow/serving gpu

spate141 opened this issue 2 years ago · comments

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
TensorFlow Serving installed from (source or binary): binary
TensorFlow Serving version: tensorflow/serving: 2.8.3-gpu

Describe the problem

I've setup a docker container with tensorflow/serving:2.8.3-gpu image that basically loads my model when a new AWS instance starts. Issue is, during the 1st load, model takes almost 30 minutes to restoring SavedModel and then another 10 minutes to read warm-up data and fully ready for serving requests. This issue only arise when you load your model with docker container for the first time on newly started AWS GPU instance. It seems like AWS instance with GPU is bit slow in loading everything from disk to drivers and eventually the model itself. Is there any workaround to fast load model?

Related issue: #1663

Source code / logs

2022-10-11 21:06:09.092082: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-10-11 21:06:09.092166: I tensorflow_serving/model_servers/server_core.cc:594]  (Re-)adding model: taxonomy
2022-10-11 21:06:09.303206: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: taxonomy version: 1}
2022-10-11 21:06:09.303240: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: taxonomy version: 1}
2022-10-11 21:06:09.303279: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: taxonomy version: 1}
2022-10-11 21:06:09.303333: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /models/taxonomy/1
2022-10-11 21:06:29.251406: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-10-11 21:06:29.251455: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /models/taxonomy/1
2022-10-11 21:06:29.357402: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-11 21:06:31.453867: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-11 21:06:34.994300: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-11 21:06:34.995066: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-11 21:07:48.020169: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-11 21:07:48.020953: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-11 21:07:48.021642: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-11 21:07:48.022259: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13797 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5
2022-10-11 21:07:59.195412: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle.

2022-10-11 21:37:25.469881: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: /models/taxonomy/1
2022-10-11 21:37:30.278330: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 1880974995 microseconds.
2022-10-11 21:37:31.269378: I tensorflow_serving/servables/tensorflow/saved_model_bundle_factory.cc:162] Wrapping session to perform batch processing
2022-10-11 21:37:31.269447: I tensorflow_serving/servables/tensorflow/bundle_factory_util.cc:65] Wrapping session to perform batch processing
2022-10-11 21:37:31.269520: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:71] Starting to read warmup data for model at /models/taxonomy/1/assets.extra/tf_serving_warmup_requests with model-warmup-options 
2022-10-11 21:39:32.490074: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
2022-10-11 21:43:17.128290: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:122] Finished reading warmup data for model at /models/taxonomy/1/assets.extra/tf_serving_warmup_requests. Number of warmup records read: 1. Elapsed time (microseconds): 345858782.
2022-10-11 21:43:17.131935: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: taxonomy version: 1}
2022-10-11 21:43:17.133127: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models
2022-10-11 21:43:17.143960: I tensorflow_serving/model_servers/server.cc:133] Using InsecureServerCredentials
2022-10-11 21:43:17.143994: I tensorflow_serving/model_servers/server.cc:391] Profiler service is enabled
2022-10-11 21:43:17.578378: I tensorflow_serving/model_servers/server.cc:417] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
2022-10-11 21:43:17.625632: I tensorflow_serving/model_servers/server.cc:438] Exporting HTTP/REST API at:localhost:8501 ...

StackOverflow Link:

https://stackoverflow.com/q/74034503/4496896

Niraj Singh · Answer 1 · Mon Oct 17 2022 18:21:22 GMT+0800 (China Standard Time)

@spate141,

Could you please explain your model path (from where you are loading your model to docker container)?
Also, please make sure NVIDIA drivers are updated and NVIDIA Container Toolkit is installed as mentioned here

You can also try creating a serving image with model build into the container to see if the model loads faster.

Thank you!

Snehal Patel · Answer 2 · Tue Oct 18 2022 01:09:52 GMT+0800 (China Standard Time)

@singhniraj08 Thanks for the reply!

I was mounting the model to docker tf 2.8.0 gpu image from the local root disk.
Yes, I'm running the latest drivers for the Nvidia and Nvidia container toolkit.

I just created a custom docker container with my model built into the docker serving image and it's the same.

It turned out the actual issue is something like this...

Amazon EBS volumes go under pre-warming when created from the AMI Snapshot which makes everything run slow for the first time, including docker loading model from disk. From AWS documentation
- Empty EBS volumes receive their maximum performance the moment that they are created and do not require initialization (formerly known as pre-warming). For volumes that were created from snapshots, the storage blocks must be pulled down from Amazon S3 and written to the volume before you can access them. This preliminary action takes time and can cause a significant increase in the latency of I/O operations the first time each block is accessed. Volume performance is achieved after all blocks have been downloaded and written to the volume.

As you can see, the first initialization of anything and everything on new instance created from the snapshot of an AMI is going to be slow. I think this is more of an AWS issue than TF one.

Niraj Singh · Answer 3 · Tue Oct 18 2022 12:47:41 GMT+0800 (China Standard Time)

@spate141, Requesting you to close this issue and follow up the issue with AWS support, since the issue is related to Amazon EBS volume.

Thank you!