RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1, 512, 8, 8]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead.

Question

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1, 512, 8, 8]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead.

thompsondd opened this issue a year ago · comments

Nguyễn Huỳnh Hải Đăng (Thompsond) commented a year ago

I have run Nasbench101 in the zero-cost Naslib and got an error

Have anyone tackled this problem?

Arjun Krishnakumar · Answer 1 · Fri Aug 18 2023 17:37:32 GMT+0800 (China Standard Time)

Hi @thompsondd,

Could you please tell us which proxy you were using? Looks to me like removing an inplace relu operation somewhere in the Nasbench101 graph will fix the issue.

Thanks!

Nguyễn Huỳnh Hải Đăng (Thompsond) · Answer 2 · Fri Aug 18 2023 20:57:01 GMT+0800 (China Standard Time)

Thank you for your reply, @Neonkraft.

I am trying to use the Synflow proxy in NAS101 but the arch "(0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 4, 3, 3, 1)" raises this error (This is just one of the error cases that I have met).

Following the source code, I have removed the inplace of relu operation in https://github.com/automl/NASLib/blob/8c45f19dc259956c3bd253071135c798ad3df8ce/naslib/search_spaces/nasbench101/base_ops.py#L18C3-L19C10, but nothing changed.

Could you please tell me what you have modified the code?

Abhash Jha · Answer 3 · Fri Aug 25 2023 21:17:51 GMT+0800 (China Standard Time)

Hi @thompsondd,

I have tried to reproduce your error and had no problem evaluating the zero-cost score for the architecture. Here's a snippet of code that I tried. You can correct me if it doesn't exactly match your case.

import logging 
from naslib.predictors import ZeroCost
from naslib import utils
from naslib.utils import setup_logger, get_dataset_api
from naslib.search_spaces.nasbench101.conversions import convert_tuple_to_spec
from naslib.search_spaces import NasBench101SearchSpace


config = utils.get_config_from_args(config_type="zc")
logger = setup_logger(config.save + "/log.log")
logger.setLevel(logging.INFO)

utils.set_seed(config.seed)
utils.log_args(config)

dataset_api = get_dataset_api("nasbench101", config.dataset)
graph = NasBench101SearchSpace(n_classes=10)
test_arch = (0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
             0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 
             0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
             0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 4, 3, 3, 1)
spec = convert_tuple_to_spec(test_arch)
graph.set_spec(spec)

predictor = ZeroCost(method_type="synflow")
train_loader, _, test_loader, _, _ = utils.get_train_val_loaders(config)
graph.parse()
score = predictor.query(graph, train_loader)
print("Zero cost score:", score)
logger.info('Test experiment complete.')

I had a synflow score of 125.99:

Is it possible that I missed something or a version-related problem? Maybe you can also try running the same snippet, and tell what you are getting.