[BUG] Wrong behavior in warmup baseline
LTluttmann opened this issue · comments
Describe the bug
The warmup baseline should simply return the evaluation results of the "normal" baseline (self.baseline) once the number of warmup epochs is exceeded. However, the alpha attribute takes on values larger than 1 for all subsequent epochs, leading to weird baseline values.
rl4co/rl4co/models/rl/reinforce/baselines.py
Line 128 in fd58215
Moreover, a misplaced parentheses causes results in a wrong combination of exponential and actual baseline loss
rl4co/rl4co/models/rl/reinforce/baselines.py
Lines 121 to 123 in fd58215
To Reproduce
No breaking bug. See results postet in wouterkool/attention-learn-to-route#51
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have provided a minimal working example to reproduce the bug (required)