LuoweiZhou / VLP

Vision-Language Pre-training for Image Captioning and Question Answering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]: AttributeError: 'Tensor' object has no attribute 'append'

aleSuglia opened this issue · comments

Hi,

I just spotted a bug in the training script run_img2txt_dist.py. Specifically, when running the code with multiple GPUs the following exception is raised:

Traceback (most recent call last):                                                                                                                                                
  File "vlp/run_guesswhat_dist.py", line 625, in <module>
    main()
  File "vlp/run_guesswhat_dist.py", line 543, in main
    vqa2_loss.append(ans_loss.item())
AttributeError: 'Tensor' object has no attribute 'append'

Unfortunately, this is due to the fact that at line https://github.com/LuoweiZhou/VLP/blob/master/vlp/run_img2txt_dist.py#L542 you're overriding vqa2_loss which will become a torch.Tensor therefore the append call at line 543 will break.

Changing line 542 to ans_loss = ans_loss.mean() should fix the error.

@aleSuglia Yes, it should be ans_loss = ans_loss.mean(), will fix, thanks for the catch!
Note that this part of the code has never been executed because we are using distributed data parallel (see the example on COCO here). The code has not been tested on the regular data parallel (i.e., n_gpu>1) which is slower than dist data parallel. We'd suggest using the dist one. If for some reason you prefer using the regular one, pls expect some rough edges and use with your own discretion.

Thanks a lot for your answer @LuoweiZhou. Yeah that makes sense. Do I have to specify anything in particular to use multiple GPUs? By looking at the code it looks like I only need to make sure that the program is able to "see" multiple devices. Is that correct?

Yes, in the 2-GPU example, you can specify CUDA_VISIBLE_DEVICES=0, 1 for both commands if you want.