tensorflow / serving

A flexible, high-performance serving system for machine learning models

Home Page:https://www.tensorflow.org/serving

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error: Batched output tensor's 0th dimension does not equal the sum of the 0th dimension sizes of the input tensors

RajezMariner opened this issue · comments

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    Address sizes: 40 bits physical, 48 bits virtual

  • TensorFlow Serving installed from (source or binary): Docker

  • TensorFlow Serving version: 2.8.0

Describe the problem

I have a model which is fine tuned from the model zoo "mask_rcnn_inception_resnet_v2". Model is trained with batch config as 1 and exported with batch input signature as "None, None, None, 3". When I try to serve the model with tf serving, the inferencing is happening but when the tf serving is started with the --enable_batching parameter, I am getting the below error when the image is sent for an inferencing request.

  • Error: Batched output tensor's 0th dimension does not equal the sum of the 0th dimension sizes of the input tensors

Exact Steps to Reproduce

Docker command to start

export model_path=""
export target_path="/models/potato"
export model_name="potato_segmentation_optimised"
export tf_docker_version="tf_opt_cpu:latest"
export tf_port="5014"
export server_host=""
export batch_config_file="/models/potato/segmentation_batch_config"

sudo docker run -p ${server_host}:${tf_port}:${tf_port} \
  --mount type=bind,source=${model_path},target=${target_path} \
  -t --entrypoint=tensorflow_model_server --name ${model_name} ${tf_docker_version} \
  --port=${tf_port} --model_name=${model_name} --model_base_path=${target_path} --enable_model_warmup \
  --enable_batching --batching_parameters_file=${batch_config_file} &

Batch config file

max_batch_size { value: 4 }
batch_timeout_micros { value: 0 }
max_enqueued_batches { value: 100 }
num_batch_threads { value: 16 }

Source code / logs


import grpc
from pip import main
import tensorflow as tf
from tensorflow import make_ndarray
import cv2

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import numpy as np
from datetime import datetime

from threading import current_thread
from concurrent.futures import ThreadPoolExecutor, wait

hostport=""
channel = grpc.insecure_channel(hostport, options=(('grpc.enable_http_proxy', 0), ('grpc.max_send_message_length', 229421142), ('grpc.max_receive_message_length', 1056644695)))
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'potato_segmentation_optimised'
request.model_spec.signature_name = 'serving_default'
image_path = "/mnt/HC_Volume_12369511/srikanth/segmentation_model/potato_img.jpeg"
keys = ['detection_boxes', 'detection_scores', 'detection_masks', 'detection_classes', 'num_detections']

def test():
    thread_name = current_thread()
    print(f"The name of the thread: {thread_name}")
    

def perform_inferencing(img_path):
    thread_name = current_thread()
    image = cv2.imread(img_path).astype(np.uint8)
    image = np.expand_dims(image, axis=0)
    print(f"Shape of the image {image.shape}")
    tensor_proto = tf.make_tensor_proto(image, shape=image.shape)
    request.inputs['input_tensor'].CopyFrom(tensor_proto)

    start = datetime.now()
    response = stub.Predict(request, 300)
    end = datetime.now()
    diff = end - start

    print(f"Completed the thread {thread_name} and Difference in time stamps diff: {diff} seconds")

    return response

def run():
    result = []
    with ThreadPoolExecutor(max_workers=1) as exe:
        for i in range(10):
            exe.submit(perform_inferencing, image_path)
            exe.map(perform_inferencing, result)

    for response in result:
        response.outputs['detection_masks']
        print(f"Got the result for the detection masks....")

def get_outputs(response):
    cur_thread = current_thread()
    print(f"The current thread: {cur_thread}")
    d = {key: make_ndarray(response.outputs[key]) for key in keys}
    print(f"The current thread completed: {cur_thread}")
    return d


def run_in_loop():
    responses = []
    for i in range(4):
        response = perform_inferencing(image_path)
        responses.append(response)
    start = datetime.now()

    with ThreadPoolExecutor(4) as exe:
        exe.map(get_outputs, responses)
    end = datetime.now()
    diff = end - start
    print(f"Diff: {diff}")

if __name__ == '__main__':
    run_in_loop()

@singhniraj08 - Please let me know if you need further support on this issue. I have missed to attach the model signature. Please find it below. The tf serving model works fine without enable_batching parameter. But with enable_batching, the problem occurs.

----------------------------------
signatures: _SignatureMap({'serving_default': <ConcreteFunction signature_wrapper(*, input_tensor) at 0x7FB213BFF190>})
----------------------------------
Infer serving default: ConcreteFunction signature_wrapper(*, input_tensor)
  Args:
    input_tensor: uint8 Tensor, shape=(4, None, None, 3)
  Returns:
    {'anchors': <1>, 'box_classifier_features': <2>, 'class_predictions_with_background': <3>, 'detection_anchor_indices': <4>, 'detection_boxes': <5>, 'detection_classes': <6>, 'detection_masks': <7>, 'detection_multiclass_scores': <8>, 'detection_scores': <9>, 'final_anchors': <10>, 'image_shape': <11>, 'mask_predictions': <12>, 'num_detections': <13>, 'num_proposals': <14>, 'proposal_boxes': <15>, 'proposal_boxes_normalized': <16>, 'raw_detection_boxes': <17>, 'raw_detection_scores': <18>, 'refined_box_encodings': <19>, 'rpn_box_encodings': <20>, 'rpn_objectness_predictions_with_background': <21>}
      <1>: float32 Tensor, shape=(None, 4)
      <2>: float32 Tensor, shape=(1800, 9, 9, 1536)
      <3>: float32 Tensor, shape=(1800, 2)
      <4>: float32 Tensor, shape=(4, 450)
      <5>: float32 Tensor, shape=(4, 450, 4)
      <6>: float32 Tensor, shape=(4, 450)
      <7>: float32 Tensor, shape=(4, 450, 33, 33)
      <8>: float32 Tensor, shape=(4, 450, 2)
      <9>: float32 Tensor, shape=(4, 450)
      <10>: float32 Tensor, shape=(4, 450, 4)
      <11>: float32 Tensor, shape=(4,)
      <12>: float32 Tensor, shape=(1800, 1, 33, 33)
      <13>: float32 Tensor, shape=(4,)
      <14>: float32 Tensor, shape=(4,)
      <15>: float32 Tensor, shape=(4, 450, 4)
      <16>: float32 Tensor, shape=(4, 450, 4)
      <17>: float32 Tensor, shape=(4, 450, 4)
      <18>: float32 Tensor, shape=(4, 450, 2)
      <19>: float32 Tensor, shape=(1800, 1, 4)
      <20>: float32 Tensor, shape=(4, 12288, 4)
      <21>: float32 Tensor, shape=(4, 12288, 2)

@RajezMariner,

I have found a similar issue regarding batching with tf serving. There is a work around proposed using TensorArray and tf.while_loop. Kindly take a look and let me know if it helps in resolving your issue. Thank you!

@singhniraj08 - I think the above issue is related to client end changes. Not sure if we need to make changes at the tf serving end for the error we are recieving.

@nniuzft - Follow up. May I know if we have any updates on this issue please?

@nniuzft @singhniraj08 - May I know if we have any updates on this?

@RajezMariner,

In TF-Serving, batching works by concatenating multiple input tensors along the 0th dimension (which is assumed to be the batch-size dimension), then calling Session::Run() on the concatenated tensor(s), and then splitting the resulting tensor(s). This only works if the resulting tensor's 0th dimension size equals the sum of the 0th-dimension sizes of the input tensors.

A common case has each input tensor having 0th-dimension size 1, in which case what's happening is that TF-Serving concatenates N tensors into a tensor with dimensions [N, ...], and then running Session::Run(), and then receiving a tensor with dimension [not N, ...] -- so it doesn't know how to attribute the output entries (which output corresponds to which input?).

Can you make client side change to send multiple images to make predictions to TF Serving grpc endpoint as shown below ans let us know if that works for you. Thank you!

image_data = []
for image in FLAGS.image.split(','):
  with open(image, 'rb') as f:
    image_data.append(f.read())

request.inputs['images'].CopyFrom(
    tf.contrib.util.make_tensor_proto(image_data, shape=[len(image_data)]))

result = stub.Predict(request, 10.0)

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

This issue was closed due to lack of activity after being marked stale for past 7 days.

Are you satisfied with the resolution of your issue?
Yes
No