microsoft / MMdnn

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error: Caffe -> IR -> CNTK

aWeinzierl opened this issue · comments

I want to convert a SqueezeNet model to CNTK. For instance, https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1

When I try to generate the CNTK model using the generated python script as well as the .npy file, which I retrieved using the following steps, I am receiving the error posted below.

python -m mmdnn.conversion._script.IRToCode --dstFramework cntk --IRModelPath DSD_SqueezeNet_cmodel.pb --dstModelPath DSD_SqueezeNet_cmodel.py --IRWeightPath DSD_SqueezeNet_cmodel.npy
python -m mmdnn.conversion.examples.cntk.imagenet_test -n DSD_SqueezeNet_cmodel.py -w DSD_SqueezeNet_cmodel.caffemodel --dump DSD_SqueezeNet_cmodel

Error:

Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 426, in load
    return pickle.load(fid, **pickle_kwargs)
_pickle.UnpicklingError: invalid load key, '\x0a'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "SqueezeNetV1_1_cmodel.py", line 13, in load_weights
    weights_dict = np.load(weight_file).item()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 429, in load
    "Failed to interpret file %s as a pickle" % repr(file))
OSError: Failed to interpret file 'SqueezeNetV1_1_cmodel.caffemodel' as a pickle

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 426, in load
    return pickle.load(fid, **pickle_kwargs)
_pickle.UnpicklingError: invalid load key, '\x0a'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "MMdnn\mmdnn\conversion\examples\cntk\imagenet_test.py", line 59, in <module>
    tester = TestCNTK()
  File "MMdnn\mmdnn\conversion\examples\cntk\imagenet_test.py", line 24, in __init__
    self.model = self.MainModel.KitModel(self.args.w)
  File "SqueezeNetV1_1_cmodel.py", line 22, in KitModel
    __weights_dict = load_weights(weight_file)
  File "SqueezeNetV1_1_cmodel.py", line 15, in load_weights
    weights_dict = np.load(weight_file, encoding='bytes').item()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 429, in load
    "Failed to interpret file %s as a pickle" % repr(file))
OSError: Failed to interpret file 'SqueezeNetV1_1_cmodel.caffemodel' as a pickle

guest the -w DSD_SqueezeNet_cmodel.caffemodel should be -w DSD_SqueezeNet_cmodel.npy in python -m mmdnn.conversion.examples.cntk.imagenet_test

Of course...
I should have noticed that. Sorry for the inconvenience.
So with your help I managed to convert SqueezeNet1.1.

But unfortunately, trying the exactly same with SqeezeNet1.0 and DSD SqueezeNet fails. This is without caffe:

C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\cntk\core.py:82: RuntimeWarning: data is not C contiguous; rearrange your data/computation to avoid costly data conversions
  RuntimeWarning)
Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "MMdnn\mmdnn\conversion\examples\cntk\imagenet_test.py", line 59, in <module>
    tester = TestCNTK()
  File "MMdnn\mmdnn\conversion\examples\cntk\imagenet_test.py", line 24, in __init__
    self.model = self.MainModel.KitModel(self.args.w)
  File "SqueezeNetV1_0_cmodel.py", line 89, in KitModel
    pool10          = pooling(relu_conv10, pooling_type=1, pooling_window_shape=(15, 15), strides=(1, 1), auto_padding=[False, False, False], ceil_out_dim=False)
  File "SqueezeNetV1_0_cmodel.py", line 114, in pooling
    layer = ops.pooling(input, **kwargs)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\cntk\internal\swig_helper.py", line 69, in wrapper
    result = f(*args, **kwds)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\cntk\ops\__init__.py", line 430, in pooling
    ceil_out_dim, include_pad, name)
ValueError: Convolution operation requires that kernel dim 15 <= input dim 13.

[CALL STACK]
    > CNTK::Dictionary::  ~Dictionary
    - CNTK::NDMask::  MaskedCount (x2)
    - CNTK::Function::  ~Function
    - CNTK_ReleaseModel
    - RtlRunOnceExecuteOnce
    - InitOnceExecuteOnce
    - _crtInitOnceExecuteOnce
    - CNTK_ReleaseModel
    - CNTK::Function::  RawOutputs
    - CNTK::Internal::  UseSparseGradientAggregationInDataParallelSGD
    - CNTK::Function::  ~Function
    - CNTK_ReleaseModel
    - RtlRunOnceExecuteOnce
    - InitOnceExecuteOnce
    - _crtInitOnceExecuteOnce

With caffe I get with both networks the following output:

...
I0110 13:44:59.478663 48776 net.cpp:200] fire2/conv1x1_1_fire2/relu_conv1x1_1_0_split does not need backward computation.
I0110 13:44:59.478663 48776 net.cpp:200] fire2/relu_conv1x1_1 does not need backward computation.
I0110 13:44:59.478663 48776 net.cpp:200] fire2/conv1x1_1 does not need backward computation.
I0110 13:44:59.479667 48776 net.cpp:200] pool1 does not need backward computation.
I0110 13:44:59.479667 48776 net.cpp:200] relu_conv1 does not need backward computation.
I0110 13:44:59.479667 48776 net.cpp:200] conv1 does not need backward computation.
I0110 13:44:59.479667 48776 net.cpp:200] input does not need backward computation.
I0110 13:44:59.479667 48776 net.cpp:242] This network produces output prob
I0110 13:44:59.479667 48776 net.cpp:255] Network initialization done.
W0110 13:44:59.483677 48776 _caffe.cpp:175] DEPRECATION WARNING - deprecated use of Python interface
W0110 13:44:59.484680 48776 _caffe.cpp:176] Use this instead (with the named "weights" parameter):
W0110 13:44:59.484680 48776 _caffe.cpp:178] Net('UserFolder\AppData\Local\Temp\tmpzr5hz2sz.prototxt', 1, weights='DSD_SqueezeNet_cmodel.caffemodel')
Traceback (most recent call last):
  File "download_model_caffe.py", line 53, in create_ir_representation
    convertToIR._convert( args() )
  File "MMdnn\mmdnn\conversion\_script\convertToIR.py", line 9, in _convert
    transformer = CaffeTransformer(args.network, args.weights, "tensorflow", args.inputShape, phase = args.caffePhase)
  File "MMdnn\mmdnn\conversion\caffe\transformer.py", line 321, in __init__
    self.data_injector if self.data_injector else DataInjector(def_path, data_path), # Load and associate learned parameters
  File "MMdnn\mmdnn\conversion\caffe\transformer.py", line 29, in __init__
    self.load_using_caffe()
  File "MMdnn\mmdnn\conversion\caffe\transformer.py", line 35, in load_using_caffe
    net = caffe.Net(str(self.def_path), str(self.data_path), caffe.TEST)
RuntimeError: Could not open file Userfolder\AppData\Local\Temp\tmpzr5hz2sz.prototxt

Like recently, this is the call I make:

class args( object ):
        srcFramework = 'caffe' 
        dstPath = destinationFilePrefix
        network = protoFile
        weights = modelFile
        inputShape = [ 3, 227, 227 ]
        caffePhase = 'TRAIN'
    convertToIR._convert( args() )

(not changed at all within this issue and the last issue -> without coffe this step works)
Maybe the file is not created or deleted -> I cannot find it right after the exception was thrown.

I use the 3.5 CPU Release Version from here: https://github.com/BVLC/caffe/tree/windows

With Caffe even v1.1 of the SqueezeNet model stopped working (also the same error)

Hi @aWeinzierl , tested with ubuntu 16.04, python3.5.2, caffe 1.0.0, cntk 2.3. No error with Squeezenet 1.1.

  1. download pre-trained model
$ python3 -m mmdnn.conversion.examples.caffe.extract_model -n squeezenet -i mmdnn/conversion/examples/data/seagull.jpg
.
.
.
[(21, 0.5285601), (128, 0.071685813), (144, 0.064104252), (416, 0.050044473), (22, 0.049522042)]

Rename:

$ mv deploy.txt squeezenet_v1.1.prototxt
  1. Convert caffe to IR
$ python3 -m mmdnn.conversion._script.convertToIR -f caffe -d kit_imagenet -n squeezenet_v1.1.prototxt -w squeezenet_v1.1.caffemodel
.
.
.
IR network structure is saved as [kit_imagenet.json].
IR network structure is saved as [kit_imagenet.pb].
IR weights are saved as [kit_imagenet.npy].
  1. Convert IR to CNTK
$ python3 -m mmdnn.conversion._script.IRToCode -f cntk -in kit_imagenet.pb -iw kit_imagenet.npy -d kit_imagenet.py

Parse file [kit_imagenet.pb] with binary format successfully.
Target network code snippet is saved as [kit_imagenet.py].
  1. Test the converted model
$ python3 -m mmdnn.conversion.examples.cntk.imagenet_test -p squeezenet -s caffe -n kit_imagenet.py -w kit_imagenet.npy

[(21, 0.52856004), (128, 0.071685657), (144, 0.064104237), (416, 0.050044276), (22, 0.049522318)]
Test model [squeezenet] from [caffe] passed.

The inference result is almost equal.

  1. Dump to original CNTK model
$ python3 -m mmdnn.conversion.examples.cntk.imagenet_test -n kit_imagenet.py -w kit_imagenet.npy --dump caffe_squeezenet.dnn

CNTK model file is saved as [caffe_squeezenet.dnn], generated by [kit_imagenet.py] and [kit_imagenet.npy].

Thank you for the step by step guidance. It works with my configuration, too.
Windows 10, Python 3.5, Caffe for Windows: https://github.com/BVLC/caffe/tree/windows and cntk 2.3.1

The problem was that caffe does not accept training proto files. Although it works without any problems with a training proto while I do not use caffe.
However, because I still wanted to convert the DSD-SqueezeNet model, I converted the trainval.proto to a proto for deployment by myself: https://github.com/aWeinzierl/SqueezeNet-DSD-Training/blob/master/deploy.prototxt

But, I ran into the same problem like before (when I did not use caffe). This also happens with the official SqueezeNet1.0 model and proto from here. This should exclude a wrong conversion.
Error:

ValueError: Convolution operation requires that kernel dim 15 <= input dim 13.

(reference to the first code insertion in post 3 in this issue)