princetonvisualai / DomainBiasMitigation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:

DT-auxi opened this issue · comments

I am trying to run the celeba_gradproj_adv experiment but for some reason the gradient computation is not working properly. The classifier network is working but when the output of classifier is fed to the domain network which is just a single Linear layer, the model throws an error. I have done set_detect_anomaly(True) as well but can't get this code to work. Any help is appreciated

The error appears in _train function of models/celeba_gradproj_adv.py script:

# Update the main network
if self.epoch % self.training_ratio == 0:
    grad_from_class = torch.autograd.grad(class_loss, self.class_network.parameters(),
                                          retain_graph=True, allow_unused=True)
    grad_from_domain = torch.autograd.grad(domain_loss, self.class_network.parameters(),
                                           retain_graph=True, allow_unused=True)

setting torch.autograd.set_detect_anomaly(True) points to the upper block in _train function:

class_outputs, _ = self.class_network(images)
domain_outputs = self.domain_network(class_outputs)

Here is the snippet of the entire error message:

C:\Users\divya\miniconda3\envs\pytorch\lib\site-packages\torch\autograd\anomaly_mode.py:70: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging.
  warnings.warn('Anomaly Detection has been enabled. '
Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error:
  File "main.py", line 15, in <module>
    main(model, opt)
  File "main.py", line 9, in main
    model.train()
  File "D:\thesis\celeba_classifiers\models\celeba_gradproj_adv.py", line 178, in train
    self._train(self.train_loader)
  File "D:\thesis\celeba_classifiers\models\celeba_gradproj_adv.py", line 80, in _train
    domain_outputs = self.domain_network(class_outputs.clone())
  File "C:\Users\divya\miniconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\divya\miniconda3\envs\pytorch\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Users\divya\miniconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1610, in linear
    ret = torch.addmm(bias, input, weight.t())
 (print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:60)
Traceback (most recent call last):
  File "main.py", line 15, in <module>
    main(model, opt)
  File "main.py", line 9, in main
    model.train()
  File "D:\thesis\celeba_classifiers\models\celeba_gradproj_adv.py", line 178, in train
    self._train(self.train_loader)
  File "D:\thesis\celeba_classifiers\models\celeba_gradproj_adv.py", line 101, in _train
    retain_graph=True, allow_unused=True)
  File "C:\Users\divya\miniconda3\envs\pytorch\lib\site-packages\torch\autograd\__init__.py", line 158, in grad
    inputs, allow_unused)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [39, 2]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Have you solved it ?

same issue I've observed, any solutions for the above issue?

The original code works with lower version of PyTorch (around v1.0). For newer version, the problem can be solved by moving the two gradient calculation lines

grad_from_class = torch.autograd.grad(class_loss, self.class_network.parameters(), retain_graph=True, allow_unused=True)
grad_from_domain = torch.autograd.grad(domain_loss, self.class_network.parameters(), retain_graph=True, allow_unused=True)

right before the update of domain classifier

self.domain_optimizer.step()