Maluuba / GeNeVA

Hi I wonder which pytorch version you use? I ran into some weird warning issue. The one that bothers a lot is this warning message:
/opt/conda/conda-bld/pytorch_1573049304260/work/aten/src/ATen/native/cudnn/RNN.cpp:1268: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().

I think it is because you are applying DataParallel on the RNN, but I am not quite sure how to resolve it.
Thank you for taking a look at this issue.

We are using 0.4.1

GeNeVA/environment.yml

Lines 13 to 14 in 7e8d597

    
           - pytorch=0.4.1=py36_cuda9.0.176_cudnn7.1.2_1 
        
           - torchvision=0.2.1=py36_1

Can you tell me the command you are running and the line in the code that generates this warning? We already call flatten_parameters() so this should ideally not happen.

GeNeVA/geneva/models/sentence_encoder.py

Line 41 in 7e8d597

self.gru.flatten_parameters()

We are using 0.4.1

GeNeVA/environment.yml

Lines 13 to 14 in 7e8d597

- pytorch=0.4.1=py36_cuda9.0.176_cudnn7.1.2_1

- torchvision=0.2.1=py36_1

Can you tell me the command you are running and the line in the code that generates this warning? We already call flatten_parameters() so this should ideally not happen.

GeNeVA/geneva/models/sentence_encoder.py

Line 41 in 7e8d597

self.gru.flatten_parameters()

I see, for some reason when I conda install from that environment file, it does not install the POytorch. So I just install the latest pytorch to run it.
The command line I run is just the one that you give to train the model on CoDraw dataset:

python geneva/inference/train.py @example_args/codraw-d-subtract.args

The place that causes that warning is probably come from here:

GeNeVA/geneva/models/recurrent_gan.py

Lines 47 to 49 in 7e8d597

    
           self.rnn = nn.DataParallel(nn.GRU(cfg.input_dim, 
        
                                             cfg.hidden_dim, 
        
                                             batch_first=False), dim=1).cuda()

I will try downgrade the pytorch version first to see if I can resolve the warning.

Another quick question is how much time does it take to train this model with your defaut setting with 2 P100 GPUs?

let me know if the warning does not go away with the pytorch version change
iirc to get comparable results as the paper it takes ~3 days

Closing due to inactivity. Please reopen with updates if the issue still remains.

	- pytorch=0.4.1=py36_cuda9.0.176_cudnn7.1.2_1
	- torchvision=0.2.1=py36_1

	self.rnn = nn.DataParallel(nn.GRU(cfg.input_dim,
	cfg.hidden_dim,
	batch_first=False), dim=1).cuda()

What is your pytorch version?