[BUG] possible bug in Rollout Baseline
fedebotu opened this issue · comments
In both mTSP and PDP, with rollout baseline, we may get an exploding behavior (loss increases after some time)
I suspect this may be due to gradient clipping by PyTorch Lightning, so we may have to investigate
I confirm the bug, there is an bug in the REINFORCE logic , I will be working on this