A question about how to calculate r_A
lucywang720 opened this issue · comments
we are now reproducing this paper, but we are confused about r_A in this paper. May I ask how to calculate r_A with each-step reward produced by PRM? I would appreciate for your help!!