classfication demo error

Question

classfication demo error

shuoyang129 opened this issue 8 years ago · comments

@danielsuo I use cuda7.5 and cudnn v5 and the latest marvin, while i am run the classification demo, i got the Segmentation fault, which the gdb says " 0x00007ffff2ace01d in cudnnDropoutGetStatesSize ()
from /usr/local/cudnn/v5/lib64/libcudnn.so.5", i have no idea how to fix it. Is there anything wrong?
by the way, i got the right response when i run mnist demo,the following is what i do when i run the classification demo：

install cuda7.5 by the .run file and get the graphic driver ok;
download the cudnn-7.5-linux-x64-v5.0-rc.tgz unzip the "include and lib64" to "/usr/local/cudnn/v5"
download marvin and compile
download the classification model and data
run the demo

output is:

Hello, World! This is Marvin. I am at a rough estimate thirty billion times more intelligent than you. Let me give you an example.

[New Thread 0x7ffff0598700 (LWP 17850)]
[New Thread 0x7fffe7bff700 (LWP 17851)]
MemoryDataLayer dataTrain loading data:
75.4819 MB
name:image dim[4]={256,3,227,227}
0.5 KB
name:label dim[4]={256,1,1,1}
301.928 KB
name:imagenet1000 227x227x3 mean image dim[3]={3,227,227}
MemoryDataLayer dataTest loading data:
75.4819 MB
name:image dim[4]={256,3,227,227}
0.5 KB
name:label dim[4]={256,1,1,1}
301.928 KB

name:imagenet1000 227x227x3 mean image dim[3]={3,227,227}

Layers: Responses:

dataTest
data[4]={256,3,227,227} RF[1,1] GP[1,1] OF[0,0]
label[4]={256,1,1,1} RF[1,1] GP[1,1] OF[0,0]
conv1 weight[4]={96,3,11,11} bias[4]={1,96,1,1}
conv1[4]={256,96,55,55} RF[11,11] GP[4,4] OF[0,0]
relu1
norm1
norm1[4]={256,96,55,55} RF[11,11] GP[4,4] OF[0,0]
pool1
pool1[4]={256,96,27,27} RF[19,19] GP[8,8] OF[0,0]
conv2 (2 groups) weight[4]={256,48,5,5} bias[4]={1,256,1,1}
conv2[4]={256,256,27,27} RF[51,51] GP[8,8] OF[-16,-16]
relu2
norm2
norm2[4]={256,256,27,27} RF[51,51] GP[8,8] OF[-16,-16]
pool2
pool2[4]={256,256,13,13} RF[67,67] GP[16,16] OF[-16,-16]
conv3 weight[4]={384,256,3,3} bias[4]={1,384,1,1}
conv3[4]={256,384,13,13} RF[99,99] GP[16,16] OF[-32,-32]
relu3
conv4 (2 groups) weight[4]={384,192,3,3} bias[4]={1,384,1,1}
conv4[4]={256,384,13,13} RF[131,131] GP[16,16] OF[-48,-48]
relu4
conv5 (2 groups) weight[4]={256,192,3,3} bias[4]={1,256,1,1}
conv5[4]={256,256,13,13} RF[163,163] GP[16,16] OF[-64,-64]
relu5
pool5
pool5[4]={256,256,6,6} RF[195,195] GP[32,32] OF[-64,-64]
fc6 weight[2]={4096,9216} bias[1]={4096}
fc6[4]={256,4096,1,1} RF[355,355] GP[0,0] OF[0,0]
relu6
drop6

Program received signal SIGSEGV, Segmentation fault.

0x00007ffff2ace01d in cudnnDropoutGetStatesSize ()
from /usr/local/cudnn/v5/lib64/libcudnn.so.5

Shuo Yang 杨硕 · Answer 1 · Tue Apr 12 2016 20:47:43 GMT+0800 (China Standard Time)

I fix this by adding a line in the Malloc() function -->init(), the in.size() is 0 in constructor function,but it become to 1 in the Malloc() function, however, we only call init() in the constructor function, which resize some variables to 0;

Daniel Suo · Answer 2 · Tue Apr 19 2016 22:52:52 GMT+0800 (China Standard Time)

We moved the resize function to the Malloc function and out of init.

Thanks for this!