[BUG] possible bug in Rollout Baseline

Question

[BUG] possible bug in Rollout Baseline

fedebotu opened this issue a year ago · comments

In both mTSP and PDP, with rollout baseline, we may get an exploding behavior (loss increases after some time)
I suspect this may be due to gradient clipping by PyTorch Lightning, so we may have to investigate

Federico Berto · Answer 1 · Wed May 17 2023 15:50:56 GMT+0800 (China Standard Time)

I confirm the bug, there is an bug in the REINFORCE logic , I will be working on this