expert trajectories是如何采集的？

Question

Fu-Dayuan opened this issue 3 months ago · comments

如题， expert trajectories是通过ChatGPT（or GPT4）采样得到的，还是llama-chat呢？我观察到即使是SFT的版本也比llama-chat版本高很多

Yifan Song · Answer 1 · Mon Mar 11 2024 10:15:14 GMT+0800 (China Standard Time)

您好，感谢对我们工作的关注！

对于 WebShop，expert trajectory 一部分来自 WebShop 作者提供的 human demonstration，另一部分我们使用 GPT-4 进行探索并按照 final reward >= 0.7 过滤得到；
ScienceWorld 环境提供了 golden trajectory 的自动生成算法，我们对其进行预处理并使用 GPT-4 标注 CoT；
对于 ALFWorld，我们对原始数据中的 human demonstration 进行预处理得到 expert trajectory，并使用 GPT-4 标注 CoT

Yanan · Answer 2 · Mon Apr 08 2024 23:54:35 GMT+0800 (China Standard Time)

你好，请问可否提供这个项目里训练使用的expert trajectory 吗？

谢谢。

Yifan Song · Answer 3 · Tue Apr 09 2024 14:54:17 GMT+0800 (China Standard Time)

您好，在setup.sh中会自动下载 expert trajectory，包括 WebShop, ScienceWorld, ALFWorld 三个环境的 expert trajectory，也可以在这里进行下载：https://drive.google.com/file/d/1YbhbL8RhQGDWFv5y6k1qgwRqSyFFsao8/view?usp=drive_link