I got a error,

Question

I got a error,

haibin894609937 opened this issue 7 years ago · comments

centos7.0+cuda8.0
[hbliu@bogon HieCoAttenVQA-master]$ th train.lua -input_img_train_h5 data/vqa_data_img_vgg_train.h5 -input_img_test_h5 data/vqa_data_img_vgg_test.h5 -input_ques_h5 data/vqa_data_prepro.h5 -input_json data/vqa_data_prepro.json -co_atten_type Alternating -feature_type VGG
{
input_img_train_h5 : "data/vqa_data_img_vgg_train.h5"
learning_rate_decay_every : 300
optim : "rmsprop"
hidden_size : 512
optim_epsilon : 1e-08
output_size : 1000
rnn_layers : 2
input_img_test_h5 : "data/vqa_data_img_vgg_test.h5"
losses_log_every : 600
id : "0"
input_ques_h5 : "data/vqa_data_prepro.h5"
learning_rate_decay_start : 0
start_from : ""
gpuid : 0
seed : 123
input_json : "data/vqa_data_prepro.json"
optim_beta : 0.995
batch_size : 20
iterPerEpoch : 1200
rnn_size : 512
max_iters : -1
checkpoint_path : "save/train_vgg"
save_checkpoint_every : 6000
learning_rate : 0.0004
co_atten_type : "Alternating"
co_atten_type : "Alternating"
feature_type : "VGG"
backend : "cudnn"
optim_alpha : 0.99
}
Use GPU0
DataLoader loading h5 image file: data/vqa_data_img_vgg_train.h5
DataLoader loading h5 image file: data/vqa_data_img_vgg_test.h5
DataLoader loading h5 question file: data/vqa_data_prepro.h5
DataLoader loading json file: data/vqa_data_prepro.json
assigned 215375 images to split 0
assigned 121512 images to split 2
Building the model...
total number of parameters in word_level: 8031747
total number of parameters in phrase_level: 2889219
total number of parameters in ques_level: 5517315
constructing clones inside the ques_level
total number of parameters in recursive_attention: 2862056
Mask is a nil
/usr/local/torch7/install/bin/luajit: ./misc/word_level.lua:94: the class torch.CudaByteTensor cannot be indexed
stack traceback:
[C]: in function '__newindex'
./misc/word_level.lua:94: in function 'forward'
train.lua:254: in function 'lossFun'
train.lua:311: in main chunk
[C]: in function 'dofile'
...cal/torch7/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x004064d0

JIA-HONG HUANG · Answer 1 · Tue Jul 18 2017 17:03:18 GMT+0800 (China Standard Time)

You guys are working on VQA dataset, right?
If yes, I guess the problem is on your vqa_data_prepro.json and vqa_data_prepro.h5.
You can try to use other dataset the author provided, cocoqa.
If you replace the above two files by cocoqa_data_prepro.json and cocoqa_data_prepro.h5, all the code should run well. When I replace those two files, everything works well.
So, you also can try this, then you will know the problem is the generation of prepro files.

Rayen_Liu · Answer 2 · Tue Jul 18 2017 17:17:46 GMT+0800 (China Standard Time)

@Jhhuangkay Thank you for your reply, I run it on ubuntu 16.04 correctly ,But I find train the network cost long time , my GPU is TITAN X with memory 12GB