A question about how to calculate r_A

Question

A question about how to calculate r_A

lucywang720 opened this issue a year ago · comments

we are now reproducing this paper, but we are confused about r_A in this paper. May I ask how to calculate r_A with each-step reward produced by PRM? I would appreciate for your help!!