RL4CO\rl4co-main\notebooks\tutorials\3-change-encoder.ipynb ，The file is not functioning properly

Question

RL4CO\rl4co-main\notebooks\tutorials\3-change-encoder.ipynb ，The file is not functioning properly

lihaoya5 opened this issue 5 months ago · comments

Describe the bug

I used python 3.10 and downloaded pip install rl4co and pip install torch_geometric and the error occurred as follows:

Change the Encoder：Error

AttributeError Traceback (most recent call last)
Cell In[6], line 12
3 from rl4co.models.nn.graph.mpnn import MessagePassingEncoder
5 gcn_encoder = GCNEncoder(
6 env_name='cvrp',
7 embedding_dim=128,
8 num_nodes=20,
9 num_layers=3,
10 )
---> 12 mpnn_encoder = MessagePassingEncoder(
13 env_name='cvrp',
14 embedding_dim=128,
15 num_nodes=20,
16 num_layers=3,
17 )
19 model = AttentionModel(
20 env,
21 baseline='rollout',
(...)
26 }
27 )
29 trainer = RL4COTrainer(
30 max_epochs=3, # few epochs for demo
31 accelerator='gpu',
32 devices=1,
33 logger=False,
34 )

File ~.conda\envs\rlco\lib\site-packages\rl4co\models\nn\graph\mpnn.py:100, in MessagePassingEncoder.init(self, env_name, embedding_dim, num_nodes, num_layers, init_embedding, aggregation, self_loop, residual)
96 self.edge_index = torch.permute(torch.nonzero(adj_matrix), (1, 0))
98 # Init message passing models
99 self.mpnn_layers = nn.ModuleList(
--> 100 [
101 MessagePassingLayer(
102 node_indim=embedding_dim,
103 node_outdim=embedding_dim,
104 edge_indim=1,
105 edge_outdim=1,
106 aggregation=aggregation,
107 residual=residual,
108 )
109 for _ in range(num_layers)
110 ]
111 )
113 # Record parameters
114 self.self_loop = self_loop

File ~.conda\envs\rlco\lib\site-packages\rl4co\models\nn\graph\mpnn.py:101, in (.0)
96 self.edge_index = torch.permute(torch.nonzero(adj_matrix), (1, 0))
98 # Init message passing models
99 self.mpnn_layers = nn.ModuleList(
100 [
--> 101 MessagePassingLayer(
102 node_indim=embedding_dim,
103 node_outdim=embedding_dim,
104 edge_indim=1,
105 edge_outdim=1,
106 aggregation=aggregation,
107 residual=residual,
108 )
109 for _ in range(num_layers)
110 ]
111 )
113 # Record parameters
114 self.self_loop = self_loop

File ~.conda\envs\rlco\lib\site-packages\rl4co\models\nn\graph\mpnn.py:29, in MessagePassingLayer.init(self, node_indim, node_outdim, edge_indim, edge_outdim, aggregation, residual, **mlp_params)
19 def init(
20 self,
21 node_indim,
(...)
27 **mlp_params,
28 ):
---> 29 super(MessagePassingLayer, self).init(aggr=aggregation)
30 # Init message passing models
31 self.edge_model = MLP(
32 input_dim=edge_indim + 2 * node_indim, output_dim=edge_outdim, **mlp_params
33 )

File ~.conda\envs\rlco\lib\site-packages\torch_geometric\nn\conv\message_passing.py:170, in MessagePassing.init(self, aggr, aggr_kwargs, flow, node_dim, decomposed_layers)
168 if not self.propagate.module.startswith(jinja_prefix):
169 if self.inspector.can_read_source:
--> 170 module = module_from_template(
171 module_name=f'{jinja_prefix}_propagate',
172 template_path=osp.join(root_dir, 'propagate.jinja'),
173 tmp_dirname='message_passing',
174 # Keyword arguments:
175 module=self.module,
176 collect_name='collect',
177 signature=self._get_propagate_signature(),
178 collect_param_dict=self.inspector.get_flat_param_dict(
179 ['message', 'aggregate', 'update']),
180 message_args=self.inspector.get_param_names('message'),
181 aggregate_args=self.inspector.get_param_names('aggregate'),
182 message_and_aggregate_args=self.inspector.get_param_names(
183 'message_and_aggregate'),
184 update_args=self.inspector.get_param_names('update'),
185 fuse=self.fuse,
186 )
188 # Cache to potentially disable later on:
189 self.class._orig_propagate = self.class.propagate

File ~.conda\envs\rlco\lib\site-packages\torch_geometric\template.py:37, in module_from_template(module_name, template_path, tmp_dirname, **kwargs)
35 sys.modules[module_name] = module
36 assert spec.loader is not None
---> 37 spec.loader.exec_module(module)
38 return module

File :883, in exec_module(self, module)

File :241, in _call_with_frames_removed(f, *args, **kwds)

File ~.cache\pyg\message_passing\rl4co.models.nn.graph.mpnn_MessagePassingLayer_propagate.py:25
21 from torch_geometric.utils.sparse import ptr2index
22 from torch_geometric.typing import SparseTensor
---> 25 class CollectArgs(NamedTuple):
26 edge_features: torch._VariableFunctionsClass.tensor
27 index: Tensor

File ~.cache\pyg\message_passing\rl4co.models.nn.graph.mpnn_MessagePassingLayer_propagate.py:26, in CollectArgs()
25 class CollectArgs(NamedTuple):
---> 26 edge_features: torch._VariableFunctionsClass.tensor
27 index: Tensor
28 ptr: typing.Optional[Tensor]

File ~\AppData\Roaming\Python\Python310\site-packages\torch_init_.py:1833, in getattr(name)
1830 import importlib
1831 return importlib.import_module(f".{name}", name)
-> 1833 raise AttributeError(f"module '{name}' has no attribute '{name}'")

AttributeError: module 'torch' has no attribute '_VariableFunctionsClass'

How should I fix this error？

Chuanbo HUA · Answer 1 · Sat Mar 02 2024 01:12:02 GMT+0800 (China Standard Time)

Hi @lihaoya5, could you share your Python, RL4CO, PyTorch, and PyG versions? I tested the notebooks/tutorials/3-change-encoder.ipynb with

Python v3.11.5
RL4CO v0.3.2
PyTorch v2.1.2+cu121
Torch Geometric v2.4.0

and it passed. Also could you provide a minimum code to reproduce the bug? I suspect that this error is caused by a mismatch in the package version.

lihaoya5 · Answer 2 · Sun Mar 03 2024 15:34:30 GMT+0800 (China Standard Time)

Thanks for the reply, I'll try your version. next, I share my configuration,I tested the notebooks/tutorials/3-change-encoder.ipynb with
Python 3.10.13
RL4CO 0.3.0
torch 2.2.1+cu118
torch-geometric 2.5.0
I have a couple of questions：

When I pip install RL4CO== 0.3.0, the torch version is installed with 2.2.1 by default.
I looked at the Geometric library（https://github.com/lgray/pytorch_geometric），before pip Geometric, 4 packages needed to be installed.They are torch-scatter，torch-sparse，torch-cluster，and torch-spline.Do you need to install these 4 packages, can you pip install torch_geometric directly?
When I run 1-quickstart.ipynb and 1-training-loop-advanced.ipynb with the above configuration, I don't get an error，But when I run 4-search-methods.ipynb and 3-change-encoder.ipynb, the error is generated.
Finally, let me show me my GPU configuration.
Sun Mar 3 15:16:40 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 537.13 Driver Version: 537.13 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3050 ... WDDM | 00000000:01:00.0 On | N/A |
| N/A 35C P8 3W / 75W | 1152MiB / 4096MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

Federico Berto · Answer 3 · Sun Mar 03 2024 16:03:19 GMT+0800 (China Standard Time)

@lihaoya5 could you update the rl4co version to the latest one?
You can do with pip install --upgrade rl4co

Also could you report further versions by running the following script?

python -c "import rl4co, torch, lightning, torchrl, tensordict, numpy, sys; print('RL4CO:', \
 rl4co.__version__, '\nPyTorch:', torch.__version__, '\nPyTorch Lightning:', \
lightning.__version__, '\nTorchRL:',  torchrl.__version__, '\nTensorDict:',\
 tensordict.__version__, '\nNumpy:', numpy.__version__, '\nPython:', \
sys.version, '\nPlatform:', sys.platform)"

lihaoya5 · Answer 4 · Sun Mar 03 2024 16:34:33 GMT+0800 (China Standard Time)

I use the conda list command in the conda environment to check the version of the package as follows:

packages in environment at C:\Users\qian.conda\envs\rl4co:

The above are all the packages installed in this environment.

lihaoya5 · Answer 5 · Sun Mar 03 2024 16:44:18 GMT+0800 (China Standard Time)

I just tried pip install --upgrade rl4co and I get the following error: my cuda will be uninstalled and the gpu is unusable.

Chuanbo HUA · Answer 6 · Sun Mar 03 2024 16:53:40 GMT+0800 (China Standard Time)

The error reported by pip could be solved by also updating torchaudio and torchvision packages with pip install torchaudio torchvision --upgrade.

About the GPU is unusable, I think this is related with your cuda version, ~~could you share with us the output of running nvidia-smi?~~ (sorry I saw it was shared in previous reply)

lihaoya5 · Answer 7 · Sun Mar 03 2024 17:06:42 GMT+0800 (China Standard Time)

Thank you very much for your answer. pip install torchaudio torchvision --upgrade after the GPU is still unusable, it should be related to my cuda version, I want to try your environment configuration first, if there is still an error, I will ask you again, thank you very much.

Federico Berto · Answer 8 · Mon Mar 11 2024 15:58:34 GMT+0800 (China Standard Time)

@lihaoya5 did you manage to fix the problem?

lihaoya5 · Answer 9 · Mon Mar 11 2024 16:34:24 GMT+0800 (China Standard Time)

Yes, I solved the problem following your environment.

ai4co / rl4co

RL4CO\rl4co-main\notebooks\tutorials\3-change-encoder.ipynb ，The file is not functioning properly

Describe the bug

packages in environment at C:\Users\qian.conda\envs\rl4co:

Name Version Build Channel