ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

Home Page:https://rl4.co

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] 'RuntimeError' related to `deepcopy`

Goh-IE opened this issue · comments

Hello!

I am trying to run 1-quickstart.ipynb file and encountering a 'RuntimeError' related to deepcopy when attempting to duplicate a model and move it to a device using copy.deepcopy(model).to(device) within the RolloutBaseline class.

Is there a recommended workaround or fix for this issue? Any advice would be greatly appreciated!
(Environment: python: 3.11.5, rl4co: 0.3.2, torch: 2.2.1, torchrl: 0.3.0, tensordict: 0.3.1)

For reference, the specific error message is as follows:
Traceback (most recent call last):
File "/home/wsgoh/code/24RLKP/rlcopackage-v1/0_tutorial.py", line 47, in
trainer.fit(model)
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/rl4co/utils/trainer.py", line 145, in fit
super().fit(
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 543, in fit
call._call_and_handle_interrupt(
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 579, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 948, in _run
call._call_setup_hook(self) # allow user to set up LightningModule in accelerator environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 94, in _call_setup_hook
_call_lightning_module_hook(trainer, "setup", stage=fn)
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook
output = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/rl4co/models/rl/common/base.py", line 153, in setup
self.post_setup_hook()
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/rl4co/models/rl/reinforce/reinforce.py", line 110, in post_setup_hook
self.baseline.setup(
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/rl4co/models/rl/reinforce/baselines.py", line 111, in setup
self.baseline.setup(*args, **kw)
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/rl4co/models/rl/reinforce/baselines.py", line 167, in setup
self._update_model(*args, **kw)
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/rl4co/models/rl/reinforce/baselines.py", line 173, in _update_model
self.model = copy.deepcopy(model).to(device)
^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 206, in _deepcopy_list
append(deepcopy(a, memo))
^^^^^^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/copy.py", line 153, in deepcopy
y = copier(memo)
^^^^^^^^^^^^
File "/home/wsgoh/anaconda3/lib/python3.11/site-packages/torch/_tensor.py", line 86, in deepcopy
raise RuntimeError(
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment. If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see pytorch/pytorch#103001

Hi @Goh-IE !
Bug found and fixed, hotfix alongside some new features coming out soon in 0.3.3. You may install the bleeding edge version via:

pip install -U git+https://github.com/ai4co/rl4co.git