ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

Home Page:https://rl4.co

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Wrong behavior in warmup baseline

LTluttmann opened this issue · comments

Describe the bug

The warmup baseline should simply return the evaluation results of the "normal" baseline (self.baseline) once the number of warmup epochs is exceeded. However, the alpha attribute takes on values larger than 1 for all subsequent epochs, leading to weird baseline values.

self.alpha = (kw["epoch"] + 1) / float(self.n_epochs)

Moreover, a misplaced parentheses causes results in a wrong combination of exponential and actual baseline loss

return self.alpha * v_b + (1 - self.alpha) * v_wb, self.alpha * l_b + (
1 - self.alpha * l_wb
)

To Reproduce

No breaking bug. See results postet in wouterkool/attention-learn-to-route#51

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have provided a minimal working example to reproduce the bug (required)