[BUG] SharedBaseline on_dim default value should be -1
hyeok9855 opened this issue Β· comments
Describe the bug
I think in the SharedBaseline, on_dim
here default value should be -1
To Reproduce
Simply change the baseline and run
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have provided a minimal working example to reproduce the bug (required)
How are you changing the baseline?
If you use SharedBaseline
, you should also use some inference techniques (e.g. multistarts for POMO or augmentation for SymNCO), so if you just use the default model with baseline="shared"
, it will not work. This is because the reward is always converted to [batch, num_pomo/num_aug], which will avoid it failing
Okay I see. But actually I wanted to use just the mean of reward as a naive baseline.
So in this case, should I implement this locally?
I see! Then feel free to submit a PR ;)