ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

Home Page:https://rl4.co

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does rl4co's POMO support SDVRP, OP?

hanseul-jeong opened this issue · comments

Thank you for your excellent work!
Does RL4CO also support solving other CO problems besides TSP and CVRP? For instance, SDVRP, OP, PCTSP, which are applicable to the AM model.
When I attempted the POMO-OP, I encountered this error message. (

current_total_prize = td["current_total_prize"] + gather_by_index(
)

../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [3,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [4,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [5,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [6,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [9,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [11,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [12,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [15,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [18,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [19,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [20,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [23,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [27,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [28,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [29,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "rl4co/rl4co/rl4co/utils/utils.py", line 36, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "rl4co/rl4co/rl4co/tasks/train.py", line 77, in run
trainer.fit(model=model, ckpt_path=cfg.get("ckpt_path"))
File "rl4co/rl4co/rl4co/utils/trainer.py", line 145, in fit
super().fit(
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 68, in _call_and_handle_interrupt
trainer._teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1012, in _teardown
self.strategy.teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/strategies/ddp.py", line 405, in teardown
super().teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/strategies/parallel.py", line 127, in teardown
super().teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/strategies/strategy.py", line 528, in teardown
self.lightning_module.cpu()
File "/opt/conda/lib/python3.8/site-packages/lightning/fabric/utilities/device_dtype_mixin.py", line 79, in cpu
return super().cpu()
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 967, in cpu
return self._apply(lambda t: t.cpu())
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 967, in
return self._apply(lambda t: t.cpu())
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

To Reproduce

I modified two lines of config/experiment/base.yaml as follows

defaults:
**- override /model: pomo.yaml

  • override /env: op.yaml**
  • override /callbacks: default.yaml
  • override /trainer: default.yaml
  • override /logger: wandb.yaml

Hi @hanseul-jeong, thanks for finding the bug and posting this issue!
We have made updates to the code. Now, you may pull the latest code and have a try. We also create a notebook under test-pomo branch notebooks/2-test-op.ipynb as the minimum code to test the POMO on OP and SDVRP environments.
It is worth noting that even though in the notebook we set the num_starts equal to the number of nodes, selecting any starting point in POMO for OP may lead to a suboptimal strategy since not all nodes may need to be visited.

@hanseul-jeong we just updated our paper here with the latest results (you can check Table 4.1 and Section 4.2), in which we explain that selecting the POMO baseline in PCTSP and OP in the usual way may not be the optimal choice

Hi @cbhua and @fedebotu.
I checked that your updated code works well, and other baselines outperform POMO in PCTSP and OP.
I'll be very helpful. Appreciate your quick response!