mask_logits in AttentionModel is set to False by default

Question

mask_logits in AttentionModel is set to False by default

caiqi opened this issue 2 years ago · comments

Thanks for the awesome work and for sharing the code! The mask_logits is set to False by default and thus the leaf logits are normalized without removing the masked leaf nodes. Is this intended or a typo?

Hang Zhao · Answer 1 · Thu Jul 07 2022 10:07:51 GMT+0800 (China Standard Time)

Thank you so much for your feedback, leaf logits are normalized without removing the masked leaf nodes directly since we find the policy training is not stable when all leaf nodes are not valid and set to '-inf'. Instead, leaf nodes are removed when calculating action probabilities (lines 139 - 144).