tensorflow / serving

A flexible, high-performance serving system for machine learning models

Home Page:https://www.tensorflow.org/serving

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TF Serving gets terminated with 'bad_alloc' error when using tf.searchsorted in the model

ayushjn20 opened this issue · comments

System information - Model Building and testing

  • OS Platform and Distribution: Ubuntu 20.04.4 LTS
  • TensorFlow version: 2.9.1
  • Python version: 3.8.10

System information - Serving

  • OS Platform and Distribution: Docker (ver: 20.10.18) on Ubuntu 20.04.4 LTS
  • TensorFlow Serving version: 2.9.1

Issue Description

For the purpose of bucketization, I took the binary search approach while implementing the logic with Tensorflow using tf.searchsorted function. I defined a custom layer (BinSearchLayer) with keras serialization enabled so that the layer as a whole can be saved while saving the model in SavedModel format.

After deploying such model with tensorflow-serving and sending an HTTP request with a sample payload, I get an error as follows,

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/usr/bin/tf_serving_entrypoint.sh: line 3:     7 Aborted                 (core dumped) tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

Exact Steps to Reproduce

You can use the following piece of code to reproduce the error.

You can also check that when a custom layer is created with similar class structure (code commented with [ALTERNATE] as comments) but without the tf.searchsorted, then we can get predictions from tensorflow serving container successfully.

Model Building

import numpy as np
import tensorflow as tf
from tensorflow.keras.utils import plot_model
from tensorflow.keras import layers, Model, Input, Sequential

scalar_features = [
    "dummy_1",
    "dummy_2",
]

vector_features = [
    "dummy_3",
    "dummy_4",
]


@tf.keras.utils.register_keras_serializable()
class BinSearchLayer(tf.keras.layers.Layer):
    def __init__(self, sorted_sequence, **kwargs):
        super(BinSearchLayer, self).__init__(**kwargs)
        self.sorted_sequence = sorted_sequence
        self.N = float(len(sorted_sequence))

    def build(self, input_shape, **kwargs):
        self._sorted_sequence = tf.constant(
            self.sorted_sequence, dtype=tf.float32
        )
        self._N = tf.constant(self.N, dtype=tf.float32)
        super(BinSearchLayer, self).build(input_shape, **kwargs)

    def call(self, inputs):
        return tf.reshape(
            tf.cast(
                tf.searchsorted(
                    sorted_sequence=self._sorted_sequence,
                    values=tf.reshape(inputs, [-1]),
                ),
                dtype=tf.float32,
            )
            / self._N,
            tf.shape(inputs),
        )

    def get_config(self, **kwargs):
        config = super(BinSearchLayer, self).get_config(**kwargs)
        config["sorted_sequence"] = self.sorted_sequence
        return config

    @classmethod
    def from_config(cls, config, custom_objects=None):
        return cls(**config)


# # [ALTERNATE]
# @tf.keras.utils.register_keras_serializable()
# class SquareScaled(tf.keras.layers.Layer):
#     def __init__(self, scale_factor, **kwargs):
#         super(SquareScaled, self).__init__(**kwargs)
#         self.scale_factor = scale_factor

#     def build(self, input_shape, **kwargs):
#         self._scale_factor = tf.constant(self.scale_factor, dtype=tf.float32)
#         super(SquareScaled, self).build(input_shape, **kwargs)

#     def call(self, inputs):
#         return tf.math.square(inputs) / self.scale_factor

#     def get_config(self, **kwargs):
#         config = super(SquareScaled, self).get_config(**kwargs)
#         config["scale_factor"] = self.scale_factor
#         return config

#     @classmethod
#     def from_config(cls, config, custom_objects=None):
#         return cls(**config)

inputs = {}

for k in scalar_features:
    inputs[k] = Input(shape=[1], dtype=tf.float32, name=k)

for k in vector_features:
    inputs[k] = Input(shape=[4], dtype=tf.float32, name=k)

binsearch_layers = {}
for k in scalar_features:
    binsearch_layers[k] = BinSearchLayer(
        sorted_sequence=np.sort(np.random.rand(150) * 2.5).tolist(),
        name=f"{k}_binsearch",
    )

# # [ALTERNATE]
# for k in scalar_features:
#     binsearch_layers[k] = SquareScaled(
#         scale_factor=np.random.rand(),
#         name=f"{k}_binsearch"
#     )


fc = Sequential(layers=[layers.Dense(m) for m in [16, 8, 1]])
x = []
for k in scalar_features:
    x.append(binsearch_layers[k](inputs[k]))
for k in vector_features:
    x.append(inputs[k])

x = layers.Concatenate(axis=1)(x)
x = fc(x)
model = Model(inputs=inputs, outputs=x)
model.save("models/dummy_model_tfserv/001/")

Deploy model with tensorflow-serving

model_name="dummy_model_tfserv"
docker run --rm -p 8111:8501 --name "dc_${model_name}" -ti \
    --mount type=bind,source=$(realpath ./models/$model_name),target=/models/$model_name \
    -e MODEL_NAME=$model_name \
    -t tensorflow/serving:2.9.1

HTTP request

sample_data = {
    "dummy_1": [[2.0], [3.0]],
    "dummy_2": [[0.2], [0.3]],
    "dummy_3": [[0.1, 0.2, 0.3, 0.4], [0.4, 0.3, 0.2, 0.1]],
    "dummy_4": [[0.5, 0.6, 0.7, 0.8], [0.8, 0.7, 0.6, 0.5]],
}

sample_data_np = {
    k: np.array(v)
    for k, v in sample_data.items()
}

print("[DEBUG]", {k: (v.shape, v.dtype) for k, v in sample_data_np.items()})

# NOTE: This works
pred = model.predict(sample_data_np)
print(pred)
# ---

import requests

url = "http://localhost:8111/v1/models/dummy_model_tfserv:predict"

headers = {"content-type": "application/json"}

# NOTE: This gives the error mentioned in the issue description
res = requests.post(url, json={"inputs": sample_data}, headers=headers)
print(res.status_code)
print(res.json())
# ---

Tried with the different versions and found that this works with 2.9.3 and beyond but does not work with 2.9.2 and before. For now, closing this issue and it seems to be fixed.