nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A question about how to calculate r_A

lucywang720 opened this issue · comments

we are now reproducing this paper, but we are confused about r_A in this paper. May I ask how to calculate r_A with each-step reward produced by PRM? I would appreciate for your help!!