torch.uniform unexpected arguments

Question

torch.uniform unexpected arguments

hugoperlin opened this issue 11 years ago · comments

Hugo Alberto Perlin commented 11 years ago

Hello,
I'm new on torch, a I'm trying to follow some tutorials. But there are this error with torch.uniform. The error appears when calling the function with parameters like

torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0x7fe3de7112d0
    [C]: at 0x7fe3de738060
    [C]: at 0x7fe3ef6fa140

Tracking the error inside TensorMath.c, I reached the line 31542. The are this instruction

 lua_call(L, lua_gettop(L)-1, LUA_MULTRET);

removing the -1 from the instruction, the expected arguments error disapear, but I recieve a Segmentation Fault.

I'm using the latest commit of torch7.

koray kavukcuoglu · Answer 1 · Sat Apr 13 2013 15:20:37 GMT+0800 (China Standard Time)

Him... I am not seeing this error and I just pulled in the latest changes.

t7> =torch.uniform(-0.2,0.2)
-0.092454366665334
t7> return torch.uniform(-0.2,0.2)
0.0058977997861803
t7> return torch.uniform(-0.2,0.2)
0.14975507911295
t7> return torch.uniform(-0.2,0.2)
0.10415823804215
t7> return torch.uniform(-0.2,0.2)
-0.12300898190588

I am on MacOSX 10.8. Can anyone else confirm this please and report the details of their system...

Clement Farabet · Answer 2 · Sat Apr 13 2013 23:24:21 GMT+0800 (China Standard Time)

Ivre never seen it myself, but a few other people mentioned this to me. They were using openblas, could that have any impact?

On Sat, Apr 13, 2013 at 3:20 AM, koray kavukcuoglu
notifications@github.com wrote:

Him... I am not seeing this error and I just pulled in the latest changes.
t7> =torch.uniform(-0.2,0.2)
-0.092454366665334
t7> return torch.uniform(-0.2,0.2)
0.0058977997861803
t7> return torch.uniform(-0.2,0.2)
0.14975507911295
t7> return torch.uniform(-0.2,0.2)
0.10415823804215
t7> return torch.uniform(-0.2,0.2)
-0.12300898190588

I am on MacOSX 10.8. Can anyone else confirm this please and report the details of their system...

Reply to this email directly or view it on GitHub:
#124 (comment)

Hugo Alberto Perlin · Answer 3 · Sun Apr 14 2013 01:03:29 GMT+0800 (China Standard Time)

I'm using Ubuntu 12.10, with openblas. Just pulled the latest code, build from scratch and still getting the error.

t7> =torch.uniform(-1.0,1.0)
    -0,82476633740589   
t7> return torch.uniform(-0.2,1.0)
    0,58297097412869    
t7> return torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0x7f42f9126280
    [C]: at 0x7f42f914ca90
    [C]: at 0x7f430a10f140  
t7> return torch.uniform(-0.2,1.0)
    -0,096398912370205  
t7> return torch.uniform(-0.2,1.1)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0x7f42f9126280
    [C]: at 0x7f42f914ca90
    [C]: at 0x7f430a10f140  
t7> return torch.uniform(-0.2,1.0)
    0,31845419211313    
t7> return torch.uniform(-0.2,0.1)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
     [C]: at 0x7f42f9126280
     [C]: at 0x7f42f914ca90
     [C]: at 0x7f430a10f140 
t7> return torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0x7f42f9126280
    [C]: at 0x7f42f914ca90
    [C]: at 0x7f430a10f140  
t7>

ogh · Answer 4 · Mon Apr 15 2013 17:42:20 GMT+0800 (China Standard Time)

I can confirm this problem on Ubuntu 11.10 as well as 13.04 beta2.
Just like haperlin, I followed the installation steps provided on the torch homepage.

I should have openblas available, but I think because of some mistake I probably made it is not being used.
Here is a part of the torch cmake log:

-- Checking for [openblas - gfortran]
--   Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
-- Checking for [openblas - gfortran - pthread]
--   Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
-- Checking for [goto2 - gfortran]
--   Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
-- Checking for [goto2 - gfortran - pthread]
--   Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
-- Checking for [acml - gfortran]
--   Library acml: BLAS_acml_LIBRARY-NOTFOUND
-- Checking for [Accelerate]
--   Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
-- Checking for [vecLib]
--   Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
-- Checking for [ptf77blas - atlas - gfortran]
--   Library ptf77blas: BLAS_ptf77blas_LIBRARY-NOTFOUND
-- Checking for [blas]
--   Library blas: /usr/lib/libblas.so
-- Found a library with BLAS API (generic).
-- Cannot find a library with LAPACK API. Not using LAPACK.

I assume that means that the standard plain BLAS is being used.
Is there another way to check which version of BLAS is being utilized by torch? Just to rule out that openblas is the cause of this.

koray kavukcuoglu · Answer 5 · Mon Apr 15 2013 18:06:09 GMT+0800 (China Standard Time)

OK, I have a virtual machine with ubuntu 12.10 and I do not see this.

koray@ubuntu:$ uname -a
Linux ubuntu 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
koray@ubuntu:$ torch
Try the IDE: torch -ide
Type help() for more info
Torch 7.0 Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
Lua 5.1 Copyright (C) 1994-2008 Lua.org, PUC-Rio
t7> return torch.uniform(-0.2,0.2)
-0.16391035728157
t7>

I am starting to think that you are using a package that modifies the torch.uniform function and that might be causing this. Are you loading any packages? Can you please confirm that this problem exists even without having required any package.

ogh · Answer 6 · Mon Apr 15 2013 18:42:15 GMT+0800 (China Standard Time)

Thanks for your quick reply and your effort!

I freshly installed Kubuntu 13.04 beta2 yesterday and followed the installation steps on http://www.torch.ch/manual/install/index line by line.
I have not installed any packages yet, only bare torch.

Here is my output for the same commands you used above:

[ogh-laptop : ~] ogh 4.2$ uname -a
Linux ogh-laptop 3.8.0-16-generic #26-Ubuntu SMP Mon Apr 1 19:52:57 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[ogh-laptop : ~] ogh 4.2$ torch
Try the IDE: torch -ide
Type help() for more info
Torch 7.0  Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
Can not find any of the default terminals for linux you can manually set terminal by gnuplot.setterm("terminal-name")
Lua 5.1  Copyright (C) 1994-2008 Lua.org, PUC-Rio
t7> return torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
        [C]: at 0x7f7a529b7480
        [C]: at 0x7f7a529ddc90
        [C]: at 0x7f7a5f73ce90
t7>

I will set up a virtual machine with Ubuntu 12.10 x86_64 and try to confirm it there.

Hugo Alberto Perlin · Answer 7 · Mon Apr 15 2013 21:43:02 GMT+0800 (China Standard Time)

Hello,
the error happen in a fresh execution of torch, without any other packge installed.

Thank's for the responses.

Clement Farabet · Answer 8 · Mon Apr 15 2013 22:07:58 GMT+0800 (China Standard Time)

Could you try to build Torch without LuaJIT?

cd torch/build
cmake .. -DWITH_LUA_JIT=0
make
make install

We made it the default recently, maybe it has some issues on specific architectures?

I really can't see anything else.

Hugo Alberto Perlin · Answer 9 · Mon Apr 15 2013 22:27:35 GMT+0800 (China Standard Time)

Hello,
worked!!
Just compiled without LuaJIT and the code work properly.

Thank you.

ogh · Answer 10 · Mon Apr 15 2013 22:28:16 GMT+0800 (China Standard Time)

That was it, thank you so much!
I was getting seriously confused after everything worked in the virtual machine that I set up and back in my working environment it didn't although the systems are basically the same.

TL;DR: Turning off the just in time compiling solves the issue: cmake .. -DWITH_LUA_JIT=0
Thanks a lot to everybody involved!

Edit: Sorry for the double post..

Clement Farabet · Answer 11 · Mon Apr 15 2013 22:28:50 GMT+0800 (China Standard Time)

Wow, ok, I'm glad it worked, but that's quite frightening. Is your linux 32bit or 64?

ogh · Answer 12 · Mon Apr 15 2013 22:32:23 GMT+0800 (China Standard Time)

I don't think it is a 64/32 bit issue. I had 64 bit linux in my virtual machine as well as on my notebook. It worked in the VM but not on the notebook. Maybe rather something related to certain CPU models.

I've got an AMD A4-3300M APU.

Clement Farabet · Answer 13 · Mon Apr 15 2013 22:33:30 GMT+0800 (China Standard Time)

Ok, AMD, interesting. We should probably post something on the LuaJIT bug tracker, unless it's on our side.

Taygun Kekec · Answer 14 · Sun Nov 10 2013 05:06:12 GMT+0800 (China Standard Time)

Hello, any progress with the issue?

I am having the same problem after a fresh installation using the scripts at torch.ch website.

Taygun Kekec · Answer 15 · Fri Nov 15 2013 17:26:49 GMT+0800 (China Standard Time)

I also solved the problem by compiling without LuaJIT. My processor is Intel i5-3210M and I am using 64bit Ubuntu system. So the problem is not specific to AMD architecture.

Any ideas how much performance loss we expect for disabling the Just In Time support for LUA?

Soumith Chintala · Answer 16 · Fri Nov 15 2013 23:53:02 GMT+0800 (China Standard Time)

@taygunk you could also try using it with LuaJIT, but turn JIT off by entering the command
jit.off()
in the torch terminal (or placing it before your scripts run).

Michaël Witrant · Answer 17 · Sun Dec 08 2013 20:03:20 GMT+0800 (China Standard Time)

I'm also having the same problem.
Turning JIT off with jit.off() didn't work.
Building torch with -DWITH_LUA_JIT=0 solved the problem.

I'm running Arch Linux 32 bits on an Intel processor.
Here's my cmake output when I enable LUA_JIT: https://gist.github.com/sigmike/7856507

Soumith Chintala · Answer 18 · Mon Dec 09 2013 02:52:11 GMT+0800 (China Standard Time)

its hard to debug it without being able to replicate the bug. could anyone offer temporary ssh access to any hardware that reproduces this?

Michaël Witrant · Answer 19 · Mon Dec 09 2013 04:00:42 GMT+0800 (China Standard Time)

@soumith I sent you an email with an access.

Soumith Chintala · Answer 20 · Mon Dec 09 2013 04:03:30 GMT+0800 (China Standard Time)

That is a little weird, I built torch in the ssh that you provided me and ran it:

[soumith@beck bin]$ ls
torch torch-lua torch-qlua torch-rocks torch-rocks-admin
[soumith@beck bin]$ ./torch
Unable to connect X11 server (disabling graphics)
Type help() for more info
LuaJIT 2.0.2 -- Copyright (C) 2005-2013 Mike Pall. http://luajit.org/
Torch 7.0 Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
t7> return torch.uniform(-2,2)
0.88313733413815
t7>

Soumith Chintala · Answer 21 · Mon Dec 09 2013 04:03:42 GMT+0800 (China Standard Time)

it was built with luajit, seems to work fine for me.

Soumith Chintala · Answer 22 · Mon Dec 09 2013 04:09:29 GMT+0800 (China Standard Time)

@sigmike Can you try building it regularly (i.e. with LuaJIT), but instead of using "torch", use "torch-lua" instead (so that it doesnt load the Qt module), that would be the exact setup that I ran it on (on your ssh machine), as I cant get X-forwarding on SSH on your machine.

You can access my torch on your ssh machine at /home/soumith/local/bin/torch

Michaël Witrant · Answer 23 · Mon Dec 09 2013 04:11:38 GMT+0800 (China Standard Time)

You have to passe float number for the bug to appear:

t7> return torch.uniform(-2,2)
0,11922571156174    
t7> return torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0xb00a4660
    [C]: at 0xb00cdd50
    [C]: at 0xb7554030

Soumith Chintala · Answer 24 · Mon Dec 09 2013 04:12:21 GMT+0800 (China Standard Time)

ah okay, got it.

Soumith Chintala · Answer 25 · Mon Dec 09 2013 04:14:00 GMT+0800 (China Standard Time)

Still works fine for me.

[soumith@beck bin]$ hostname
beck.mike
[soumith@beck bin]$ pwd
/home/soumith/local/bin
[soumith@beck bin]$ ./torch
Unable to connect X11 server (disabling graphics)
Type help() for more info
LuaJIT 2.0.2 -- Copyright (C) 2005-2013 Mike Pall. http://luajit.org/
Torch 7.0 Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
t7> return torch.uniform(-0.2,0.2)
0.10058184470981
t7> return torch.uniform(-2,2)
-1.4603152005002
t7> return torch.uniform(-0.2,0.2)
-0.0051071456633508
t7>

Soumith Chintala · Answer 26 · Mon Dec 09 2013 04:20:01 GMT+0800 (China Standard Time)

i didn't compile it with any Blas. Let me check what happens if I compile it with OpenBlas enabled

Michaël Witrant · Answer 27 · Mon Dec 09 2013 04:22:19 GMT+0800 (China Standard Time)

Indeed. It looks like it fails only when running it from my local X session. Through ssh on my account it works.

I'm not sure how to load torch with torch-lua:

[mike@beck ~]$ torch-lua 
LuaJIT 2.0.2 -- Copyright (C) 2005-2013 Mike Pall. http://luajit.org/
JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
> return torch.uniform(-0.2,0.2)
stdin:1: attempt to index global 'torch' (a nil value)
stack traceback:
    stdin:1: in main chunk
    [C]: at 0x0804a150

Soumith Chintala · Answer 28 · Mon Dec 09 2013 04:23:48 GMT+0800 (China Standard Time)

okay, with torch-lua, you have to first enter
require 'torch'
return torch.uniform(-0.2,0.2)

But, atleast the problem has been isolated to torch-qlua and not torch-lua

Michaël Witrant · Answer 29 · Mon Dec 09 2013 04:24:04 GMT+0800 (China Standard Time)

It's an environment problem. Maybe a locale. When I run it with env -i it works. I'll try to identify the variable.

Soumith Chintala · Answer 30 · Mon Dec 09 2013 04:25:33 GMT+0800 (China Standard Time)

ah ok, if its an environment problem, the main issue i can think of is any dynamic libraries that are being preferred over others. Do you have LD_LIBRARY_PATH set? if so, can you remove any folders specific to libraries that you compiled/installed yourself, or unset it altogether and give that a shot

Michaël Witrant · Answer 31 · Mon Dec 09 2013 04:26:22 GMT+0800 (China Standard Time)

It works with LC_ALL=C.

Otherwise my locales are like this:

LANG=en_US.UTF-8
LC_CTYPE=fr_FR.UTF-8
LC_NUMERIC=fr_FR.UTF-8
LC_TIME=fr_FR.UTF-8
LC_COLLATE=fr_FR.UTF-8
LC_MONETARY=fr_FR.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=fr_FR.UTF-8
LC_NAME=fr_FR.UTF-8
LC_ADDRESS=fr_FR.UTF-8
LC_TELEPHONE=fr_FR.UTF-8
LC_MEASUREMENT=fr_FR.UTF-8
LC_IDENTIFICATION=fr_FR.UTF-8
LC_ALL=

Soumith Chintala · Answer 32 · Mon Dec 09 2013 04:27:40 GMT+0800 (China Standard Time)

ah, maybe the french keyboard people can give more insights @clementfarabet @andresy

Soumith Chintala · Answer 33 · Mon Dec 09 2013 04:29:56 GMT+0800 (China Standard Time)

I rejected openblas as the cause as well, just for good measure.

It's probably the dot symbol . in a different locale throwing it off

Michaël Witrant · Answer 34 · Mon Dec 09 2013 04:32:34 GMT+0800 (China Standard Time)

Probably. I only have to switch LC_NUMERIC to C to make it work.