Error: Batched output tensor's 0th dimension does not equal the sum of the 0th dimension sizes of the input tensors

Question

Error: Batched output tensor's 0th dimension does not equal the sum of the 0th dimension sizes of the input tensors

RajezMariner opened this issue 2 years ago · comments

Rajesh Somasundaram commented 2 years ago

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 40 bits physical, 48 bits virtual
TensorFlow Serving installed from (source or binary): Docker
TensorFlow Serving version: 2.8.0

Describe the problem

I have a model which is fine tuned from the model zoo "mask_rcnn_inception_resnet_v2". Model is trained with batch config as 1 and exported with batch input signature as "None, None, None, 3". When I try to serve the model with tf serving, the inferencing is happening but when the tf serving is started with the --enable_batching parameter, I am getting the below error when the image is sent for an inferencing request.

Error: Batched output tensor's 0th dimension does not equal the sum of the 0th dimension sizes of the input tensors

Exact Steps to Reproduce

Docker command to start

export model_path=""
export target_path="/models/potato"
export model_name="potato_segmentation_optimised"
export tf_docker_version="tf_opt_cpu:latest"
export tf_port="5014"
export server_host=""
export batch_config_file="/models/potato/segmentation_batch_config"

sudo docker run -p ${server_host}:${tf_port}:${tf_port} \
  --mount type=bind,source=${model_path},target=${target_path} \
  -t --entrypoint=tensorflow_model_server --name ${model_name} ${tf_docker_version} \
  --port=${tf_port} --model_name=${model_name} --model_base_path=${target_path} --enable_model_warmup \
  --enable_batching --batching_parameters_file=${batch_config_file} &

Batch config file

max_batch_size { value: 4 }
batch_timeout_micros { value: 0 }
max_enqueued_batches { value: 100 }
num_batch_threads { value: 16 }

Source code / logs


import grpc
from pip import main
import tensorflow as tf
from tensorflow import make_ndarray
import cv2

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import numpy as np
from datetime import datetime

from threading import current_thread
from concurrent.futures import ThreadPoolExecutor, wait

hostport=""
channel = grpc.insecure_channel(hostport, options=(('grpc.enable_http_proxy', 0), ('grpc.max_send_message_length', 229421142), ('grpc.max_receive_message_length', 1056644695)))
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'potato_segmentation_optimised'
request.model_spec.signature_name = 'serving_default'
image_path = "/mnt/HC_Volume_12369511/srikanth/segmentation_model/potato_img.jpeg"
keys = ['detection_boxes', 'detection_scores', 'detection_masks', 'detection_classes', 'num_detections']

def test():
    thread_name = current_thread()
    print(f"The name of the thread: {thread_name}")
    

def perform_inferencing(img_path):
    thread_name = current_thread()
    image = cv2.imread(img_path).astype(np.uint8)
    image = np.expand_dims(image, axis=0)
    print(f"Shape of the image {image.shape}")
    tensor_proto = tf.make_tensor_proto(image, shape=image.shape)
    request.inputs['input_tensor'].CopyFrom(tensor_proto)

    start = datetime.now()
    response = stub.Predict(request, 300)
    end = datetime.now()
    diff = end - start

    print(f"Completed the thread {thread_name} and Difference in time stamps diff: {diff} seconds")

    return response

def run():
    result = []
    with ThreadPoolExecutor(max_workers=1) as exe:
        for i in range(10):
            exe.submit(perform_inferencing, image_path)
            exe.map(perform_inferencing, result)

    for response in result:
        response.outputs['detection_masks']
        print(f"Got the result for the detection masks....")

def get_outputs(response):
    cur_thread = current_thread()
    print(f"The current thread: {cur_thread}")
    d = {key: make_ndarray(response.outputs[key]) for key in keys}
    print(f"The current thread completed: {cur_thread}")
    return d


def run_in_loop():
    responses = []
    for i in range(4):
        response = perform_inferencing(image_path)
        responses.append(response)
    start = datetime.now()

    with ThreadPoolExecutor(4) as exe:
        exe.map(get_outputs, responses)
    end = datetime.now()
    diff = end - start
    print(f"Diff: {diff}")

if __name__ == '__main__':
    run_in_loop()

Rajesh Somasundaram · Answer 1 · Thu Aug 18 2022 16:30:50 GMT+0800 (China Standard Time)

@singhniraj08 - Please let me know if you need further support on this issue. I have missed to attach the model signature. Please find it below. The tf serving model works fine without enable_batching parameter. But with enable_batching, the problem occurs.

----------------------------------
signatures: _SignatureMap({'serving_default': <ConcreteFunction signature_wrapper(*, input_tensor) at 0x7FB213BFF190>})
----------------------------------
Infer serving default: ConcreteFunction signature_wrapper(*, input_tensor)
  Args:
    input_tensor: uint8 Tensor, shape=(4, None, None, 3)
  Returns:
    {'anchors': <1>, 'box_classifier_features': <2>, 'class_predictions_with_background': <3>, 'detection_anchor_indices': <4>, 'detection_boxes': <5>, 'detection_classes': <6>, 'detection_masks': <7>, 'detection_multiclass_scores': <8>, 'detection_scores': <9>, 'final_anchors': <10>, 'image_shape': <11>, 'mask_predictions': <12>, 'num_detections': <13>, 'num_proposals': <14>, 'proposal_boxes': <15>, 'proposal_boxes_normalized': <16>, 'raw_detection_boxes': <17>, 'raw_detection_scores': <18>, 'refined_box_encodings': <19>, 'rpn_box_encodings': <20>, 'rpn_objectness_predictions_with_background': <21>}
      <1>: float32 Tensor, shape=(None, 4)
      <2>: float32 Tensor, shape=(1800, 9, 9, 1536)
      <3>: float32 Tensor, shape=(1800, 2)
      <4>: float32 Tensor, shape=(4, 450)
      <5>: float32 Tensor, shape=(4, 450, 4)
      <6>: float32 Tensor, shape=(4, 450)
      <7>: float32 Tensor, shape=(4, 450, 33, 33)
      <8>: float32 Tensor, shape=(4, 450, 2)
      <9>: float32 Tensor, shape=(4, 450)
      <10>: float32 Tensor, shape=(4, 450, 4)
      <11>: float32 Tensor, shape=(4,)
      <12>: float32 Tensor, shape=(1800, 1, 33, 33)
      <13>: float32 Tensor, shape=(4,)
      <14>: float32 Tensor, shape=(4,)
      <15>: float32 Tensor, shape=(4, 450, 4)
      <16>: float32 Tensor, shape=(4, 450, 4)
      <17>: float32 Tensor, shape=(4, 450, 4)
      <18>: float32 Tensor, shape=(4, 450, 2)
      <19>: float32 Tensor, shape=(1800, 1, 4)
      <20>: float32 Tensor, shape=(4, 12288, 4)
      <21>: float32 Tensor, shape=(4, 12288, 2)

Niraj Singh · Answer 2 · Thu Aug 18 2022 21:21:35 GMT+0800 (China Standard Time)

@RajezMariner,

I have found a similar issue regarding batching with tf serving. There is a work around proposed using TensorArray and tf.while_loop. Kindly take a look and let me know if it helps in resolving your issue. Thank you!

Rajesh Somasundaram · Answer 3 · Fri Aug 19 2022 20:04:57 GMT+0800 (China Standard Time)

@singhniraj08 - I think the above issue is related to client end changes. Not sure if we need to make changes at the tf serving end for the error we are recieving.

Rajesh Somasundaram · Answer 4 · Mon Sep 12 2022 14:59:29 GMT+0800 (China Standard Time)

@nniuzft - Follow up. May I know if we have any updates on this issue please?

Rajesh Somasundaram · Answer 5 · Thu Oct 13 2022 18:39:42 GMT+0800 (China Standard Time)

@nniuzft @singhniraj08 - May I know if we have any updates on this?

Niraj Singh · Answer 6 · Fri Apr 28 2023 19:48:19 GMT+0800 (China Standard Time)

@RajezMariner,

In TF-Serving, batching works by concatenating multiple input tensors along the 0th dimension (which is assumed to be the batch-size dimension), then calling Session::Run() on the concatenated tensor(s), and then splitting the resulting tensor(s). This only works if the resulting tensor's 0th dimension size equals the sum of the 0th-dimension sizes of the input tensors.

A common case has each input tensor having 0th-dimension size 1, in which case what's happening is that TF-Serving concatenates N tensors into a tensor with dimensions [N, ...], and then running Session::Run(), and then receiving a tensor with dimension [not N, ...] -- so it doesn't know how to attribute the output entries (which output corresponds to which input?).

Can you make client side change to send multiple images to make predictions to TF Serving grpc endpoint as shown below ans let us know if that works for you. Thank you!

image_data = []
for image in FLAGS.image.split(','):
  with open(image, 'rb') as f:
    image_data.append(f.read())

request.inputs['images'].CopyFrom(
    tf.contrib.util.make_tensor_proto(image_data, shape=[len(image_data)]))

result = stub.Predict(request, 10.0)

github-actions · Answer 7 · Sat May 06 2023 09:50:30 GMT+0800 (China Standard Time)

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions · Answer 8 · Sat May 13 2023 09:52:52 GMT+0800 (China Standard Time)

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler · Answer 9 · Sat May 13 2023 09:53:04 GMT+0800 (China Standard Time)

Are you satisfied with the resolution of your issue?
Yes
No