[MXNet model-r50-am-lfw] MXNet to TensorFlow: a variable (named id) generated from no where

Question

[MXNet model-r50-am-lfw] MXNet to TensorFlow: a variable (named id) generated from no where

tengerye opened this issue 6 years ago · comments

TengQi Ye commented 6 years ago

Platform: ubuntu 16.04

Python version: Python 3.6.4 :: Anaconda

Source framework with version: MXNet 1.1.0 with GPU

Destination framework with version: Tensorflow 1.1.0 with GPU

Pre-trained model path (webpath or webdisk path): https://drive.google.com/open?id=1x0-EiYX9jMUKiq-n1Bd9OCK4fVB3a54v

Running scripts:

python -m mmdnn.conversion._script.convertToIR -f mxnet -n model-symbol.json -w model-0000.params -d resnet50 --inputShape 3 112 112
python -m mmdnn.conversion._script.IRToCode -f tensorflow --IRModelPath resnet50.pb --IRWeightPath resnet50.npy --dstModelPath tf_resnet50.py
python -m mmdnn.conversion.examples.tensorflow.imagenet_test -s tensorflow -p resnet -n tf_resnet50 -w resnet50.npy

TengQi Ye · Answer 1 · Tue Apr 03 2018 16:13:22 GMT+0800 (China Standard Time)

With the first script, I got:

[15:56:05] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.12.1. Attempting to upgrade...
[15:56:05] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
/home/yetengqi/anaconda3/lib/python3.6/site-packages/mxnet/module/base_module.py:53: UserWarning: You created Module with Module(..., label_names=['softmax_label']) but input with name 'softmax_label' is not found in symbol.list_arguments(). Did you mean one of:
	data
  warnings.warn(msg)
Warning: MXNet Parser has not supported operator null with name data.
Warning: convert the null operator with name [data] into input layer.
IR network structure is saved as [resnet50.json].
IR network structure is saved as [resnet50.pb].
IR weights are saved as [resnet50.npy].

With the second script, I got:

Parse file [resnet50.pb] with binary format successfully.
TensorflowEmitter has not supported operator [_copy].
id
Target network code snippet is saved as [tf_resnet50.py].

With the third script, I got:

Traceback (most recent call last):
  File "/home/yetengqi/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/yetengqi/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/yetengqi/anaconda3/lib/python3.6/site-packages/mmdnn/conversion/examples/tensorflow/imagenet_test.py", line 71, in <module>
    tester = TestTF()
  File "/home/yetengqi/anaconda3/lib/python3.6/site-packages/mmdnn/conversion/examples/tensorflow/imagenet_test.py", line 20, in __init__
    self.input, self.model = self.MainModel.KitModel(self.args.w)
  File "/media/yetengqi/9e162b12-a483-4c2e-9967-0c36a024d616/home/tengqi/projects/faceRecognition/baseline/insightface/model-r50-am-lfw/tf_resnet50.py", line 26, in KitModel
    _minusscalar0_second = tf.constant(__weights_dict['_minusscalar0_second']['value'], name='_minusscalar0_second')
  File "/home/yetengqi/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 106, in constant
    attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
  File "/home/yetengqi/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/yetengqi/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1184, in __init__
    raise ValueError("'%s' is not a valid node name" % node_def.name)
ValueError: '_minusscalar0_second' is not a valid node name

Although I can try to correct that error, you will see another error which is caused by a variable on the line 28 named id.

Cheng(Kit) CHEN · Answer 2 · Tue Apr 03 2018 17:51:30 GMT+0800 (China Standard Time)

Hi @tengerye,
it is because the pre-process of mxnet is not supported in Tensorflow. We will try to fix it. Thanks.

Cheng(Kit) CHEN · Answer 3 · Fri Apr 06 2018 17:49:53 GMT+0800 (China Standard Time)

Ref issue #85. The error is fixed. But seems some accuracy problem. Ping you if there is any progress.

TengQi Ye · Answer 4 · Sun Apr 08 2018 09:37:59 GMT+0800 (China Standard Time)

Well, after updating from the source. I encounter a new error after executing python -m mmdnn.convers ion.examples.tensorflow.imagenet_test -s tensorflow -p resnet -n tf_resnet50 -w resnet50.npy and changing the variable name:

lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "lib/python3.6/site-packages/mmdnn/conversion/examples/tensorflow/imagenet_test.py", line 71, in <module>
    tester = TestTF()
  File "lib/python3.6/site-packages/mmdnn/conversion/examples/tensorflow/imagenet_test.py", line 20, in __init__
    self.input, self.model = self.MainModel.KitModel(self.args.w)
  File "insightface/model-r50-am-lfw/tf_resnet50.py", line 259, in KitModel
    pre_fc1         = tf.layers.dense(bn1, 512, kernel_initializer = tf.constant_initializer(__weights_dict['pre_fc1']['weights']), bias_initializer = tf.constant_initializer(__weights_dict['pre_fc1']['bias']), use_bias = True)
  File "lib/python3.6/site-packages/tensorflow/python/layers/core.py", line 248, in dense
    return layer.apply(inputs)
  File "lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 809, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 680, in __call__
    self.build(input_shapes)
  File "lib/python3.6/site-packages/tensorflow/python/layers/core.py", line 134, in build
    trainable=True)
  File "lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 533, in add_variable
    partitioner=partitioner)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1297, in get_variable
    constraint=constraint)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1093, in get_variable
    constraint=constraint)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 439, in get_variable
    constraint=constraint)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 408, in _true_getter
    use_resource=use_resource, constraint=constraint)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 800, in _get_single_variable
    use_resource=use_resource)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 2157, in variable
    use_resource=use_resource)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 2147, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 2130, in default_variable_creator
    constraint=constraint)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 233, in __init__
    constraint=constraint)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 327, in _init_from_args
    initial_value(), name="initial_value", dtype=dtype)
  File "lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 784, in <lambda>
    shape.as_list(), dtype=dtype, partition_info=partition_info)
  File "lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py", line 217, in __call__
    self.value, dtype=dtype, shape=shape, verify_shape=verify_shape)
  File "lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 214, in constant
    value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 488, in make_tensor_proto
    (shape_size, nparray.size))
ValueError: Too many elements provided. Needed at most 262144, but received 12845056

May I ask if there still exists error?

Besides, the original error "ValueError: '_minusscalar0_second' is not a valid node name" still exists. Thank you for your kind reply.

TengQi Ye · Answer 5 · Wed May 09 2018 14:42:58 GMT+0800 (China Standard Time)

The problem is solved according to #85 .
The following are steps to repeat in case you are still confused.

Make sure your codes are up-to-date:
pip install -U git+https://github.com/Microsoft/MMdnn.git@master
Convert the model to intermediate representation format:
python -m mmdnn.conversion._script.convertToIR -f mxnet -n model-symbol.json -w model-0000.params -d resnet50 --inputShape 3 112 112
Convert to tensorflow code:
python -m mmdnn.conversion._script.IRToCode -f tensorflow --IRModelPath resnet50.pb --IRWeightPath resnet50.npy --dstModelPath tf_resnet50.py
Save the following codes to a file tensorflow_inference.py in your current directory:

# inference with tensorflow

from __future__ import absolute_import
import argparse
import numpy as np
from six import text_type as _text_type
from tensorflow.contrib.keras.api.keras.preprocessing import image
import tensorflow as tf


parser = argparse.ArgumentParser()

parser.add_argument('-n', type=_text_type, default='kit_imagenet',
                    help='Network structure file name.')


parser.add_argument('-w', type=_text_type, required=True,
                    help='Network weights file name')

parser.add_argument('--image', '-i',
                    type=_text_type, help='Test image path.',
                    default="mmdnn/conversion/examples/data/seagull.jpg"
)


args = parser.parse_args()
if args.n.endswith('.py'):
    args.n = args.n[:-3]

# import converted model
model_converted = __import__(args.n).KitModel(args.w)
input_tf, model_tf = model_converted

# load img with BGRTranspose=True
img = image.load_img(args.image, target_size = (112, 112))
img = image.img_to_array(img)
img = img[..., ::-1]

input_data = np.expand_dims(img, 0)

# inference with tensorflow
with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    predict = sess.run(model_tf, feed_dict = {input_tf : input_data})
print(predict)

Run the following shell scripts:
python tensorflow_inference.py -n tf_resnet50.py -w resnet50.npy -i ~/Downloads/download.jpg
Note the last string (~/Downloads/download.jpg) indicates the image to predict.

Shoufa Chen · Answer 6 · Wed Aug 08 2018 10:46:47 GMT+0800 (China Standard Time)

The second step now should be:

--inputShape 3,112,112

not

 --inputShape 3 112 112

as pointed by here

Akhil Pillai · Answer 7 · Mon Jun 08 2020 18:56:38 GMT+0800 (China Standard Time)

I get "NotImplementedError" when i try aws sagemaker mxnet params