Does rl4co's POMO support SDVRP, OP?
hanseul-jeong opened this issue · comments
Thank you for your excellent work!
Does RL4CO also support solving other CO problems besides TSP and CVRP? For instance, SDVRP, OP, PCTSP, which are applicable to the AM model.
When I attempted the POMO-OP, I encountered this error message. (
rl4co/rl4co/envs/routing/op.py
Line 83 in 22497e4
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [3,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [4,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [5,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [6,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [9,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [11,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [12,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [15,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [18,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [19,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [20,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [23,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [27,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [28,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [29,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "rl4co/rl4co/rl4co/utils/utils.py", line 36, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "rl4co/rl4co/rl4co/tasks/train.py", line 77, in run
trainer.fit(model=model, ckpt_path=cfg.get("ckpt_path"))
File "rl4co/rl4co/rl4co/utils/trainer.py", line 145, in fit
super().fit(
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 68, in _call_and_handle_interrupt
trainer._teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1012, in _teardown
self.strategy.teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/strategies/ddp.py", line 405, in teardown
super().teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/strategies/parallel.py", line 127, in teardown
super().teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/strategies/strategy.py", line 528, in teardown
self.lightning_module.cpu()
File "/opt/conda/lib/python3.8/site-packages/lightning/fabric/utilities/device_dtype_mixin.py", line 79, in cpu
return super().cpu()
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 967, in cpu
return self._apply(lambda t: t.cpu())
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 967, in
return self._apply(lambda t: t.cpu())
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
To Reproduce
I modified two lines of config/experiment/base.yaml as follows
defaults:
**- override /model: pomo.yaml
- override /env: op.yaml**
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
Hi @hanseul-jeong, thanks for finding the bug and posting this issue!
We have made updates to the code. Now, you may pull the latest code and have a try. We also create a notebook under test-pomo
branch notebooks/2-test-op.ipynb
as the minimum code to test the POMO on OP and SDVRP environments.
It is worth noting that even though in the notebook we set the num_starts
equal to the number of nodes, selecting any starting point in POMO for OP may lead to a suboptimal strategy since not all nodes may need to be visited.
@hanseul-jeong we just updated our paper here with the latest results (you can check Table 4.1 and Section 4.2), in which we explain that selecting the POMO baseline in PCTSP and OP in the usual way may not be the optimal choice