torch / DEPRECEATED-torch7-distro

Torch7: state-of-the-art machine learning algorithms

Home Page:www.torch.ch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

torch.uniform unexpected arguments

hugoperlin opened this issue · comments

Hello,
I'm new on torch, a I'm trying to follow some tutorials. But there are this error with torch.uniform. The error appears when calling the function with parameters like

torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0x7fe3de7112d0
    [C]: at 0x7fe3de738060
    [C]: at 0x7fe3ef6fa140  

Tracking the error inside TensorMath.c, I reached the line 31542. The are this instruction

 lua_call(L, lua_gettop(L)-1, LUA_MULTRET);

removing the -1 from the instruction, the expected arguments error disapear, but I recieve a Segmentation Fault.

I'm using the latest commit of torch7.

Him... I am not seeing this error and I just pulled in the latest changes.

t7> =torch.uniform(-0.2,0.2)
-0.092454366665334
t7> return torch.uniform(-0.2,0.2)
0.0058977997861803
t7> return torch.uniform(-0.2,0.2)
0.14975507911295
t7> return torch.uniform(-0.2,0.2)
0.10415823804215
t7> return torch.uniform(-0.2,0.2)
-0.12300898190588

I am on MacOSX 10.8. Can anyone else confirm this please and report the details of their system...

Ivre never seen it myself, but a few other people mentioned this to me. They were using openblas, could that have any impact?

On Sat, Apr 13, 2013 at 3:20 AM, koray kavukcuoglu
notifications@github.com wrote:

Him... I am not seeing this error and I just pulled in the latest changes.
t7> =torch.uniform(-0.2,0.2)
-0.092454366665334
t7> return torch.uniform(-0.2,0.2)
0.0058977997861803
t7> return torch.uniform(-0.2,0.2)
0.14975507911295
t7> return torch.uniform(-0.2,0.2)
0.10415823804215
t7> return torch.uniform(-0.2,0.2)
-0.12300898190588

I am on MacOSX 10.8. Can anyone else confirm this please and report the details of their system...

Reply to this email directly or view it on GitHub:
#124 (comment)

I'm using Ubuntu 12.10, with openblas. Just pulled the latest code, build from scratch and still getting the error.

t7> =torch.uniform(-1.0,1.0)
    -0,82476633740589   
t7> return torch.uniform(-0.2,1.0)
    0,58297097412869    
t7> return torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0x7f42f9126280
    [C]: at 0x7f42f914ca90
    [C]: at 0x7f430a10f140  
t7> return torch.uniform(-0.2,1.0)
    -0,096398912370205  
t7> return torch.uniform(-0.2,1.1)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0x7f42f9126280
    [C]: at 0x7f42f914ca90
    [C]: at 0x7f430a10f140  
t7> return torch.uniform(-0.2,1.0)
    0,31845419211313    
t7> return torch.uniform(-0.2,0.1)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
     [C]: at 0x7f42f9126280
     [C]: at 0x7f42f914ca90
     [C]: at 0x7f430a10f140 
t7> return torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0x7f42f9126280
    [C]: at 0x7f42f914ca90
    [C]: at 0x7f430a10f140  
t7> 
commented

I can confirm this problem on Ubuntu 11.10 as well as 13.04 beta2.
Just like haperlin, I followed the installation steps provided on the torch homepage.

I should have openblas available, but I think because of some mistake I probably made it is not being used.
Here is a part of the torch cmake log:

-- Checking for [openblas - gfortran]
--   Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
-- Checking for [openblas - gfortran - pthread]
--   Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
-- Checking for [goto2 - gfortran]
--   Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
-- Checking for [goto2 - gfortran - pthread]
--   Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
-- Checking for [acml - gfortran]
--   Library acml: BLAS_acml_LIBRARY-NOTFOUND
-- Checking for [Accelerate]
--   Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
-- Checking for [vecLib]
--   Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
-- Checking for [ptf77blas - atlas - gfortran]
--   Library ptf77blas: BLAS_ptf77blas_LIBRARY-NOTFOUND
-- Checking for [blas]
--   Library blas: /usr/lib/libblas.so
-- Found a library with BLAS API (generic).
-- Cannot find a library with LAPACK API. Not using LAPACK.

I assume that means that the standard plain BLAS is being used.
Is there another way to check which version of BLAS is being utilized by torch? Just to rule out that openblas is the cause of this.

OK, I have a virtual machine with ubuntu 12.10 and I do not see this.

koray@ubuntu:$ uname -a
Linux ubuntu 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
koray@ubuntu:
$ torch
Try the IDE: torch -ide
Type help() for more info
Torch 7.0 Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
Lua 5.1 Copyright (C) 1994-2008 Lua.org, PUC-Rio
t7> return torch.uniform(-0.2,0.2)
-0.16391035728157
t7>

I am starting to think that you are using a package that modifies the torch.uniform function and that might be causing this. Are you loading any packages? Can you please confirm that this problem exists even without having required any package.

commented

Thanks for your quick reply and your effort!

I freshly installed Kubuntu 13.04 beta2 yesterday and followed the installation steps on http://www.torch.ch/manual/install/index line by line.
I have not installed any packages yet, only bare torch.

Here is my output for the same commands you used above:

[ogh-laptop : ~] ogh 4.2$ uname -a
Linux ogh-laptop 3.8.0-16-generic #26-Ubuntu SMP Mon Apr 1 19:52:57 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[ogh-laptop : ~] ogh 4.2$ torch
Try the IDE: torch -ide
Type help() for more info
Torch 7.0  Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
Can not find any of the default terminals for linux you can manually set terminal by gnuplot.setterm("terminal-name")
Lua 5.1  Copyright (C) 1994-2008 Lua.org, PUC-Rio
t7> return torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
        [C]: at 0x7f7a529b7480
        [C]: at 0x7f7a529ddc90
        [C]: at 0x7f7a5f73ce90
t7> 

I will set up a virtual machine with Ubuntu 12.10 x86_64 and try to confirm it there.

Hello,
the error happen in a fresh execution of torch, without any other packge installed.

Thank's for the responses.

Could you try to build Torch without LuaJIT?

cd torch/build
cmake .. -DWITH_LUA_JIT=0
make
make install

We made it the default recently, maybe it has some issues on specific architectures?

I really can't see anything else.

Hello,
worked!!
Just compiled without LuaJIT and the code work properly.

Thank you.

commented

That was it, thank you so much!
I was getting seriously confused after everything worked in the virtual machine that I set up and back in my working environment it didn't although the systems are basically the same.

TL;DR: Turning off the just in time compiling solves the issue: cmake .. -DWITH_LUA_JIT=0
Thanks a lot to everybody involved!

Edit: Sorry for the double post..

Wow, ok, I'm glad it worked, but that's quite frightening. Is your linux 32bit or 64?

commented

I don't think it is a 64/32 bit issue. I had 64 bit linux in my virtual machine as well as on my notebook. It worked in the VM but not on the notebook. Maybe rather something related to certain CPU models.

I've got an AMD A4-3300M APU.

Ok, AMD, interesting. We should probably post something on the LuaJIT bug tracker, unless it's on our side.

Hello, any progress with the issue?

I am having the same problem after a fresh installation using the scripts at torch.ch website.

I also solved the problem by compiling without LuaJIT. My processor is Intel i5-3210M and I am using 64bit Ubuntu system. So the problem is not specific to AMD architecture.

Any ideas how much performance loss we expect for disabling the Just In Time support for LUA?

@taygunk you could also try using it with LuaJIT, but turn JIT off by entering the command
jit.off()
in the torch terminal (or placing it before your scripts run).

I'm also having the same problem.
Turning JIT off with jit.off() didn't work.
Building torch with -DWITH_LUA_JIT=0 solved the problem.

I'm running Arch Linux 32 bits on an Intel processor.
Here's my cmake output when I enable LUA_JIT: https://gist.github.com/sigmike/7856507

its hard to debug it without being able to replicate the bug. could anyone offer temporary ssh access to any hardware that reproduces this?

@soumith I sent you an email with an access.

That is a little weird, I built torch in the ssh that you provided me and ran it:

[soumith@beck bin]$ ls
torch torch-lua torch-qlua torch-rocks torch-rocks-admin
[soumith@beck bin]$ ./torch
Unable to connect X11 server (disabling graphics)
Type help() for more info
LuaJIT 2.0.2 -- Copyright (C) 2005-2013 Mike Pall. http://luajit.org/
Torch 7.0 Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
t7> return torch.uniform(-2,2)
0.88313733413815
t7>

it was built with luajit, seems to work fine for me.

@sigmike Can you try building it regularly (i.e. with LuaJIT), but instead of using "torch", use "torch-lua" instead (so that it doesnt load the Qt module), that would be the exact setup that I ran it on (on your ssh machine), as I cant get X-forwarding on SSH on your machine.

You can access my torch on your ssh machine at /home/soumith/local/bin/torch

You have to passe float number for the bug to appear:

t7> return torch.uniform(-2,2)
0,11922571156174    
t7> return torch.uniform(-0.2,0.2)
expected arguments: [double] [double] | *DoubleTensor* [double] [double]
stack traceback:
    [C]: at 0xb00a4660
    [C]: at 0xb00cdd50
    [C]: at 0xb7554030  

ah okay, got it.

Still works fine for me.

[soumith@beck bin]$ hostname
beck.mike
[soumith@beck bin]$ pwd
/home/soumith/local/bin
[soumith@beck bin]$ ./torch
Unable to connect X11 server (disabling graphics)
Type help() for more info
LuaJIT 2.0.2 -- Copyright (C) 2005-2013 Mike Pall. http://luajit.org/
Torch 7.0 Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
t7> return torch.uniform(-0.2,0.2)
0.10058184470981
t7> return torch.uniform(-2,2)
-1.4603152005002
t7> return torch.uniform(-0.2,0.2)
-0.0051071456633508
t7>

i didn't compile it with any Blas. Let me check what happens if I compile it with OpenBlas enabled

Indeed. It looks like it fails only when running it from my local X session. Through ssh on my account it works.

I'm not sure how to load torch with torch-lua:

[mike@beck ~]$ torch-lua 
LuaJIT 2.0.2 -- Copyright (C) 2005-2013 Mike Pall. http://luajit.org/
JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
> return torch.uniform(-0.2,0.2)
stdin:1: attempt to index global 'torch' (a nil value)
stack traceback:
    stdin:1: in main chunk
    [C]: at 0x0804a150

okay, with torch-lua, you have to first enter
require 'torch'
return torch.uniform(-0.2,0.2)

But, atleast the problem has been isolated to torch-qlua and not torch-lua

It's an environment problem. Maybe a locale. When I run it with env -i it works. I'll try to identify the variable.

ah ok, if its an environment problem, the main issue i can think of is any dynamic libraries that are being preferred over others. Do you have LD_LIBRARY_PATH set? if so, can you remove any folders specific to libraries that you compiled/installed yourself, or unset it altogether and give that a shot

It works with LC_ALL=C.

Otherwise my locales are like this:

LANG=en_US.UTF-8
LC_CTYPE=fr_FR.UTF-8
LC_NUMERIC=fr_FR.UTF-8
LC_TIME=fr_FR.UTF-8
LC_COLLATE=fr_FR.UTF-8
LC_MONETARY=fr_FR.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=fr_FR.UTF-8
LC_NAME=fr_FR.UTF-8
LC_ADDRESS=fr_FR.UTF-8
LC_TELEPHONE=fr_FR.UTF-8
LC_MEASUREMENT=fr_FR.UTF-8
LC_IDENTIFICATION=fr_FR.UTF-8
LC_ALL=

ah, maybe the french keyboard people can give more insights @clementfarabet @andresy

I rejected openblas as the cause as well, just for good measure.

It's probably the dot symbol . in a different locale throwing it off

Probably. I only have to switch LC_NUMERIC to C to make it work.