deepakkumar1984 / MxNet.Sharp

.NET Standard bindings for Apache MxNet with Imperative, Symbolic and Gluon Interface for developing, training and deploying Machine Learning models in C#. https://mxnet.tech-quantum.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error when calling Module.Bind() when using MxNet-CU101.Runtime.Redist

tk4218 opened this issue · comments

In trying to utilize GPUs, I have grabbed the MxNet-CU101.Runtime.Redist package, which uses mxnet-cu101 v1.5.0.

I have written some code to try and load an existing model and bind parameters to it:

    mModel = New Module(sym, context:new [] {Context.Gpu(0)}, label_names:null, data_names:new [] {"data"})
    mModel.Bind(new [] {New IO.DataDesc("data", New Shape(1, 3, 112, 112))}, for_training:False)
    mModel.SetParams(ndaArgParams, ndaAuxParams)

However, the call to mModel.Bind() produces the following exception:
System.ArgumentException: Can not pass IntPtr.Zero
Parameter name: h
at MxNet.Executor..ctor(IntPtr h, Context context, List1 gradReqs, Dictionary2 groupToCtx)
at MxNet.Symbol.SimpleBind(Context ctx, Dictionary2 grad_req, Dictionary2 type_dict, Dictionary2 stype_dict, Dictionary2 group2ctx, String[] shared_arg_names, Executor shared_exec, NDArrayDict shared_buffer, DataDesc[] kwargs)
at MxNet.Modules.DataParallelExecutorGroup.BindiThExec(Int32 i, DataDesc[] data_shapes, DataDesc[] label_shapes, DataParallelExecutorGroup shared_group)
at MxNet.Modules.DataParallelExecutorGroup.BindExec(DataDesc[] data_shapes, DataDesc[] label_shapes, DataParallelExecutorGroup shared_group, Boolean reshape)
at MxNet.Modules.DataParallelExecutorGroup..ctor(Symbol symbol, Context[] contexts, Int32[] workload, DataDesc[] data_shapes, DataDesc[] label_shapes, String[] param_names, Boolean for_training, Boolean inputs_need_grad, DataParallelExecutorGroup shared_group, String[] fixed_param_names, OpGradReq grad_req, String[] state_names, Dictionary`2[] group2ctxs)
at MxNet.Modules.Module.Bind(DataDesc[] data_shapes, DataDesc[] label_shapes, Boolean for_training, Boolean inputs_need_grad, Boolean force_rebind, Module shared_module, OpGradReq grad_req)

In the Symbol.SimpleBind() method, the call to NativeMethods.MXExecutorSimpleBindEx() does not return anything for exe_handle which caused the error. If I switch to using the CPU version of libmxnet, this function works properly with the exact same parameters passed in.

I was able to get this working properly with mxnet-cu101 v1.6.0b20191114. I tried getting v1.6.0, but it was giving me a really small libmxnet.dll and a bunch of really large mxnet_xx.dll files. If there was a way to get v1.6.0 of mxnet-cu101, I believe this would fix the issue.

Stale issue message