ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

Home Page:https://rl4.co

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] The reward calculation is abnormal in the mTSP environment

wenwenla opened this issue · comments

Describe the bug

It seems that the distance back to depot are not considered in this implementation.
Are there any special considerations?

I believe that the problem is caused by the following reason:

current_length is zeroed before being assigned to max_subtour_length in https://github.com/ai4co/rl4co/blob/main/rl4co/envs/routing/mtsp.py#L93

        # If current agent is different from previous agent, then we have a new subtour and reset the length, otherwise we add the new distance
        current_length = torch.where(
            cur_agent_idx != td["agent_idx"],
            torch.zeros_like(td["current_length"]),
            td["current_length"] + get_distance(cur_loc, prev_loc),
        )

        # If done, we add the distance from the current_node to the depot as well
        current_length = torch.where(
            done, current_length + get_distance(cur_loc, depot_loc), current_length
        )

        # We update the max_subtour_length and reset the current_length
        max_subtour_length = torch.where(
            current_length > td["max_subtour_length"],
            current_length,
            td["max_subtour_length"],
        )

To Reproduce

from rl4co.envs import MTSPEnv
from rl4co.models.zoo import AttentionModel, AttentionModelPolicy
from rl4co.utils.trainer import RL4COTrainer


env = MTSPEnv(num_loc=50, min_loc=0, max_loc=1, min_num_agents=5, max_num_agents=5, cost_type="minmax")
model = AttentionModel(env, baseline="rollout", train_data_size=100_000, val_data_size=10_000, optimizer_kwargs={'lr': 1e-4})

trainer = RL4COTrainer(max_epochs=10000, accelerator='gpu', devices=2, logger=None)
trainer.fit(model)

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have provided a minimal working example to reproduce the bug (required)

Hi @wenwenla! Thanks for reporting this bug! It is true that in the previous code, the current_length will be reset to 0 before adding the distance from the last subtour node back to the depot. This will cause the max_subtour_length to be calculated incorrectly.

I fixed this bug in the 7255384. And now you can use the following minimum code to test the max_subtour_length recording.

import torch
from rl4co.envs import MTSPEnv

env = MTSPEnv(num_loc=10, min_loc=0, max_loc=1, min_num_agents=2, max_num_agents=2, cost_type="minmax")
td = env.reset(batch_size=[3])

manual_actions = torch.tensor([
    [0, 1, 2, 3, 4, 0, 5, 6, 7, 8, 9],
    [0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
])

current_length_list = []
max_length_list = []
for action_idx in range(len(manual_actions[-1])):
    td.set("action", manual_actions[:, action_idx])
    td = env.step(td)["next"]
    current_length_list.append(td["current_length"])
    max_length_list.append(td["max_subtour_length"])

print(torch.stack(current_length_list))
print(torch.stack(max_length_list))

Here is one example output from my running just for your reference:

tensor([[0.0000, 0.0000, 0.0000],
        [0.4049, 0.0000, 0.7825],
        [1.1483, 0.5790, 0.9177],
        [1.7881, 1.2751, 1.5645],
        [2.4722, 1.8814, 1.9754],
        [0.0000, 2.2702, 2.9125],
        [0.5037, 2.5459, 3.5903],
        [0.7616, 3.1266, 3.9620],
        [1.0053, 4.0521, 4.0747],
        [1.3667, 4.3621, 5.5828],
        [2.4768, 6.0978, 0.0000]])
tensor([[0.0000, 0.0000, 0.0000],
        [0.4049, 0.0000, 0.7825],
        [1.1483, 0.5790, 0.9177],
        [1.7881, 1.2751, 1.5645],
        [2.4722, 1.8814, 1.9754],
        [2.9636, 2.2702, 2.9125],
        [2.9636, 2.5459, 3.5903],
        [2.9636, 3.1266, 3.9620],
        [2.9636, 4.0521, 4.0747],
        [2.9636, 4.3621, 5.5828],
        [2.9636, 6.0978, 6.4194]])

Thank you for your quick fix.