[BUG] The reward calculation is abnormal in the mTSP environment
wenwenla opened this issue · comments
Describe the bug
It seems that the distance back to depot are not considered in this implementation.
Are there any special considerations?
I believe that the problem is caused by the following reason:
current_length is zeroed before being assigned to max_subtour_length in https://github.com/ai4co/rl4co/blob/main/rl4co/envs/routing/mtsp.py#L93
# If current agent is different from previous agent, then we have a new subtour and reset the length, otherwise we add the new distance
current_length = torch.where(
cur_agent_idx != td["agent_idx"],
torch.zeros_like(td["current_length"]),
td["current_length"] + get_distance(cur_loc, prev_loc),
)
# If done, we add the distance from the current_node to the depot as well
current_length = torch.where(
done, current_length + get_distance(cur_loc, depot_loc), current_length
)
# We update the max_subtour_length and reset the current_length
max_subtour_length = torch.where(
current_length > td["max_subtour_length"],
current_length,
td["max_subtour_length"],
)
To Reproduce
from rl4co.envs import MTSPEnv
from rl4co.models.zoo import AttentionModel, AttentionModelPolicy
from rl4co.utils.trainer import RL4COTrainer
env = MTSPEnv(num_loc=50, min_loc=0, max_loc=1, min_num_agents=5, max_num_agents=5, cost_type="minmax")
model = AttentionModel(env, baseline="rollout", train_data_size=100_000, val_data_size=10_000, optimizer_kwargs={'lr': 1e-4})
trainer = RL4COTrainer(max_epochs=10000, accelerator='gpu', devices=2, logger=None)
trainer.fit(model)
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have provided a minimal working example to reproduce the bug (required)
Hi @wenwenla! Thanks for reporting this bug! It is true that in the previous code, the current_length
will be reset to 0 before adding the distance from the last subtour node back to the depot. This will cause the max_subtour_length
to be calculated incorrectly.
I fixed this bug in the 7255384. And now you can use the following minimum code to test the max_subtour_length
recording.
import torch
from rl4co.envs import MTSPEnv
env = MTSPEnv(num_loc=10, min_loc=0, max_loc=1, min_num_agents=2, max_num_agents=2, cost_type="minmax")
td = env.reset(batch_size=[3])
manual_actions = torch.tensor([
[0, 1, 2, 3, 4, 0, 5, 6, 7, 8, 9],
[0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
])
current_length_list = []
max_length_list = []
for action_idx in range(len(manual_actions[-1])):
td.set("action", manual_actions[:, action_idx])
td = env.step(td)["next"]
current_length_list.append(td["current_length"])
max_length_list.append(td["max_subtour_length"])
print(torch.stack(current_length_list))
print(torch.stack(max_length_list))
Here is one example output from my running just for your reference:
tensor([[0.0000, 0.0000, 0.0000],
[0.4049, 0.0000, 0.7825],
[1.1483, 0.5790, 0.9177],
[1.7881, 1.2751, 1.5645],
[2.4722, 1.8814, 1.9754],
[0.0000, 2.2702, 2.9125],
[0.5037, 2.5459, 3.5903],
[0.7616, 3.1266, 3.9620],
[1.0053, 4.0521, 4.0747],
[1.3667, 4.3621, 5.5828],
[2.4768, 6.0978, 0.0000]])
tensor([[0.0000, 0.0000, 0.0000],
[0.4049, 0.0000, 0.7825],
[1.1483, 0.5790, 0.9177],
[1.7881, 1.2751, 1.5645],
[2.4722, 1.8814, 1.9754],
[2.9636, 2.2702, 2.9125],
[2.9636, 2.5459, 3.5903],
[2.9636, 3.1266, 3.9620],
[2.9636, 4.0521, 4.0747],
[2.9636, 4.3621, 5.5828],
[2.9636, 6.0978, 6.4194]])
Thank you for your quick fix.