Tensorflow model is working in python but converted tfjs model is not working

Question

Tensorflow model is working in python but converted tfjs model is not working

newgrit1004 opened this issue 3 months ago · comments

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): : Linux Ubuntu 20.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: X
TensorFlow.js installed from (npm or script link): script tag

Describe the current behavior
I tried to convert the EfficientSAM-ti(https://github.com/yformer/EfficientSAM) decoder into tensorflow.js.

The origin model was pytorch so I need to do this step by step.

Torch -> Onnx

git clone https://github.com/yformer/EfficientSAM.git

Then replace the code that uses torch.tile function in efficient_sam_decoder with the following code since torch.tile is not supported in onnx opset 12.

https://github.com/yformer/EfficientSAM/blob/c9408a74b1db85e7831977c66e9462c6f4891729/efficient_sam/efficient_sam_decoder.py#L259

image_embeddings_tiled = image_embeddings.repeat(1, max_num_queries, 1, 1).view(
            batch_size * max_num_queries,
            image_embed_dim_c,
            image_embed_dim_h,
            image_embed_dim_w,
        )

Then use this script to export onnx model. Set the opset to 12.

https://github.com/yformer/EfficientSAM/blob/main/export_to_onnx.py

onnx to tensorflow

# tensorflow==2.13
import onnx_tf
import onnx
onnx_model = onnx.load('efficient_sam_vitt_decoder_opset12.onnx')
saved_tensorflow_folder_name = "saved_model"
tf_model = onnx_tf.backend.prepare(onnx_model, auto_cast=True)
tf_model.export_graph(saved_tensorflow_folder_name)

tensorflow to tensorflow.js

pip install tensorflow==2.16.1
tensorflowjs_converter \
        --input_format tf_saved_model \
        --output_format tfjs_graph_model \
        saved_model \
        tfjs_model

I compare the result between pytorch and tensorflow and check that they are same.
My tensorflow inference code is this. The reference of this code is https://github.com/yformer/EfficientSAM/blob/main/EfficientSAM_example.py

import tensorflow as tf
import numpy as np
from PIL import Image
import pickle

saved_model_path = 'saved_model'
model = tf.saved_model.load(saved_model_path)
infer = model.signatures["serving_default"]
sample_image_np = np.array(Image.open("figs/examples/dogs.jpg"))
sample_image_tensor = tf.convert_to_tensor(sample_image_np, dtype=tf.float32)
sample_image_tensor = tf.expand_dims(sample_image_tensor, axis=0)
input_points = tf.expand_dims(tf.expand_dims(tf.constant([[580, 350], [650, 350]], dtype=tf.float32), 0), 0)
input_labels = tf.expand_dims(tf.expand_dims(tf.constant([1, 1], dtype=tf.float32), 0), 0)
orig_im_size = tf.constant([sample_image_np.shape[0], sample_image_np.shape[1]], dtype=tf.int64)

shape = (1, 256, 64, 64)
img_embeddings = np.random.randn(*shape).astype(np.float32)
results = infer(
    image_embeddings=img_embeddings,
    batched_point_coords=input_points,
    batched_point_labels=input_labels,
    orig_im_size=orig_im_size
)

Finally, I want to run the tfjs code, however the error message is very frustrating.

async function loadTFModel() {
    const TFJS_BACKEND = 'webgl'
    const MODEL_PATH = 'tfjs_model/model.json';

    await tf.setBackend(TFJS_BACKEND);
    await tf.ready();
    return await tf.loadGraphModel(MODEL_PATH);
}

const model = await loadTFModel();


const imageEmbeddingsShape = [1, 256, 64, 64]
const imageEmbeddings = tf.randomUniform(imageEmbeddingsShape, 0, 1);

const pointLabelsShape = [1,1,3]
const pointLabels = tf.randomUniform(pointLabelsShape, 0, 1);

const origImgSize = tf.tensor([1024, 1024], [2], 'int32')

const pointCoordsShape = [1,1, 3, 2]
const pointCoords = tf.randomUniform(pointCoordsShape, 0, 1);

console.log(`imageEmbeddings.shape : ${imageEmbeddings.shape}`);
console.log(`pointLabels.shape : ${pointLabels.shape}`);
console.log(`origImgSize.shape : ${origImgSize.shape}`);
console.log(`pointCoords.shape : ${pointCoords.shape}`);

const inputs = {
    'image_embeddings': imageEmbeddings,
    'batched_point_labels': pointLabels,
    'orig_im_size': origImgSize,
    'batched_point_coords': pointCoords
}
const outputs = await model.executeAsync(inputs);

I think the dynamic input shape should be the problem.
shape of pointLabels = [1, 1, number of points]
shape of pointCoords = [1, 1, number of points, 2]

but I want to keep the dynamic input.

How can I modify the code or command?

newgrit1004 · Answer 1 · Mon Mar 25 2024 15:14:29 GMT+0800 (China Standard Time)

@gaikwadrahul8 Thank you for being assignees. Please tell me when you got any bottlenecks or need any information

gaikwadrahul8 · Answer 2 · Sat Mar 30 2024 22:08:37 GMT+0800 (China Standard Time)

Hi, @newgrit1004

We sincerely apologize for the delayed response. I see you've provided your Github repo but I don't see the tfjs model folder which you converted and exact steps you took leading up to the error message would be highly beneficial for our replication efforts, if you add exact steps in README.md file to follow instructions before encountering that error message would be grateful

To expedite our investigation into the error you encountered, we would be grateful if you could share your converted tfjs model so I'll try to run with your provided tfjs-code to replicate the same behavior from my end also

By gathering this information, we can attempt to reproduce the error on our end and conduct a more thorough root cause analysis.

Thank you for your cooperation and patience.

newgrit1004 · Answer 3 · Tue Apr 02 2024 17:41:37 GMT+0800 (China Standard Time)

Hi, @gaikwadrahul8

I forked the repository so you can easily reproduce the result.

Check it out here

I also upload the model files except some models that exceed 100MB.

You can just use the model files or generate the model files by following the README.md

I hope this issue can be solved. Thank you.

Also, feel free to ask anything when you reproduce this.

newgrit1004 · Answer 4 · Tue Apr 02 2024 17:45:40 GMT+0800 (China Standard Time)

Also, the reason why I exported the onnx model with opset version 12 is because of the known error. I can avoid the known error when I exported the onnx model with opset version 12.

gaikwadrahul8 · Answer 5 · Thu Apr 04 2024 23:56:26 GMT+0800 (China Standard Time)

Hi, @newgrit1004

I apologize for the delayed response and thank you for sharing your Github repo with us, I'm able to reproduce the same error which you reported in your issue template so we'll have to dig more into this issue and will update you soon. thank you for bringing this issue to our attention, I really appreciate your valuable time and efforts

For reference I have added output screenshot below :

Thank you for your cooperation and patience.

kaka · Answer 6 · Wed Apr 17 2024 15:48:47 GMT+0800 (China Standard Time)

Hi @newgrit1004,

This repo is EfficientSAM Tensorflow version, including TFLite and TFJS, hope this helps you.

newgrit1004 · Answer 7 · Fri Apr 19 2024 09:27:40 GMT+0800 (China Standard Time)

Hi, @kaka-lin
Thanks for the help! However, I need more info. I'll open an issue on your repo.