AjayP13 / webdreamer

Training LLMs as web agents with synthetic data. πŸ€–πŸŒ

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WebDreamer

This repository implements the Large Language Models Can Self-Improve At Web Agent Tasks paper.

This project evaluates & trains LLMs as web agents on the WebArena benchmark with synthetic data using DataDreamer and evaluates using the VERTEX score from SymbolicAI. πŸ€–πŸŒ

Setup and Install

The ideal version of Python for this project is Python 3.10. The project can be cloned and setup with:

git clone --recurse-submodules git@github.com:AjayP13/webdreamer.git
cd webdreamer/
git config --local core.hooksPath ./.githooks/
./.githooks/post-checkout

Environment Setup

Before running, you will want to edit the project.env file and fill in all the environment variables with the needed values.

Running

To run the project you can simply do the following command to see the list of options of tasks that can be run:

./run.sh --help

The ./run.sh file will automatically setup a virtual environment and setup project dependencies on each run. After the first time you run this, all project dependencies will be setup. To skip checking / installing dependencies to make future runs faster, see the PROJECT_SKIP_INSTALL_REQS environment variable in project.env.

Formatting and Linting

You can automatically format and lint the code with:

./format.sh

Additionally, a pre-commit hook will automatically format & enforce Python style through linting when committing.

About

Training LLMs as web agents with synthetic data. πŸ€–πŸŒ

License:MIT License


Languages

Language:Python 61.4%Language:Shell 38.6%