princeton-nlp / WebShop

[NeurIPS 2022] đź›’WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Home Page:https://webshop-pnlp.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reproducing Rule-based baseline and not matching paper results

lihkinVerma opened this issue · comments

Hey Authors,

I tried replicating the results for rule-based baseline.
In paper, the mentioned metrics for the same are: Score / SR = 45.8 / 19%
while replicating it, I am getting folloiwng values for metrics: Score / SR = 26.27 / 3.59%

The values of all reward variables are also not matching the paper's baseline. I obtained
r_type: 0.5826
r_attr: 0.4108
r_option: 0.0
r_price: 0.0632

Can you check for the anomaly?

Hey @lihkinVerma ,

I am also interested in replicating scores.

How do you run the above? Specifically:

  1. How do you run Webshop (which server / env do you use?) (e.g. did you use ./setup.sh -d small and then ./run_dev.sh)
  2. How do you run the baseline?