Error on Neural Turing Machine Demo

Question

Error on Neural Turing Machine Demo

jramapuram opened this issue 9 years ago · comments

numpy version: numpy 1.10.0.post2

~/.cgtrc:

debug = False                                                                                                                                       
precision = single                                                                                                                                  
backend = native                                                                                                                                    
cache_dir = ~/.cgt_cache                                                                                                                            
enable_inplace_opt = True                                                                                                                           
enable_simplification = True                                                                                                                        
parallel = False                                                                                                                                                                                                                                                                                      
force_python_impl = False                                                                                                                           
debug_cpp = False                                                                                                                                   
verbose = False

The mnist & variation autoencoder demos seems to work fine, having issues with the neural turing machine demo:

(.venv)➜  examples git:(master) ✗ python demo_neural_turing_machine.py 
Traceback (most recent call last):
  File "demo_neural_turing_machine.py", line 469, in <module>
    main()
  File "demo_neural_turing_machine.py", line 415, in main
    ntm = make_ntm(opt)
  File "demo_neural_turing_machine.py", line 199, in make_ntm
    controller = make_ff_controller(opt)
  File "demo_neural_turing_machine.py", line 104, in make_ff_controller
    assert infer_shape(k_bHm) == (b,H,m)
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 707, in infer_shape
    return tuple(x.op.value if isinstance(x.op, Constant) else None for x in  CACHER.simplify(cgt.shape(arr)))
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 2728, in simplify
    for x in xs: self.simplify1(x)
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 2733, in simplify1
    update_simplify_map(x, self.analysis, self.repl)
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 2626, in update_simplify_map
    maybe_pair = process_top_stack_item_and_maybe_get_replacement(stack, analysis, repl)
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 2589, in process_top_stack_item_and_maybe_get_replacement
    newnewnode = maybe_replace(newnode, analysis, repl)
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 2689, in maybe_replace
    out = cgt.constant(py_numeric_apply(node, [p.op.value for p in parents]))
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 2926, in py_numeric_apply
    callable.call(vals, out)
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 794, in call
    return self._func(*args)    
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 1192, in f
    self.info.pyfunc(reads[0], out=write)
  File "/home/jramapuram/projects/cgt/cgt/core.py", line 1096, in _nu_iceil
    np.ceil(x,out)
TypeError: ufunc 'ceil' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

Sergey Bartunov · Answer 1 · Sun Oct 18 2015 15:53:18 GMT+0800 (China Standard Time)

I have the same problem running this demo and sometimes with my code.
It's strange that in my code I can overcome this by disabling simplifications.

Meanwhile, 7 tests fail with the same error.

Jason Ramapuram · Answer 2 · Sun Oct 18 2015 18:01:50 GMT+0800 (China Standard Time)

What did you use for your CUDA build settings? I enabled CUDA & CUDNN and am running v7 of CUDA

Sergey Bartunov · Answer 3 · Sun Oct 18 2015 19:03:30 GMT+0800 (China Standard Time)

I don't use CUDA at all and I'm not sure that this error is related to it.
I tried to debug my code which fails with the same error and find what causes it.

def py_numeric_apply(node, vals):
...
        #vals is np.array with dtype=float32
        out = alloc_output(node,vals)
        #out is np.array with dtype=int64 because node.typ.dtype is so
        callable.call(vals, out) 
        # this causes the error since np.ceil always returns floats (even if the input is integer)
        # which cannot be written to int64 array

If got it right according to UNARY_INFO the iceil operation should cast it's result to integer somehow, but it doesn't.

Sergey Bartunov · Answer 4 · Sun Oct 18 2015 19:15:12 GMT+0800 (China Standard Time)

Changing

def _nu_iceil(x,out=None):
    if out is None:
        return np.ceil(x)
    else:
        np.ceil(x, out)

to

def _nu_iceil(x,out=None):
    if out is None:
        return np.ceil(x)
    else:
        out[...] = np.ceil(x)

helps me and allows to run the demo. My code however fails with the following error:

Python(87053,0x7fff71141000) malloc: *** mach_vm_map(size=140458053210112) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Segmentation fault: 11

Jason Ramapuram · Answer 5 · Thu Nov 05 2015 22:52:07 GMT+0800 (China Standard Time)

This is now fixed with the commit from @sbos [ #46 ]