26hzhang / os-genesis

Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Home Page:https://qiushisun.github.io/OS-Genesis-Home/

Repository from Github https://github.com26hzhang/os-genesisRepository from Github https://github.com26hzhang/os-genesis

OS-Genesis

overview

arXiv License Paper page Twitter Follow Twitter Follow

This repository contains the code and data for paper OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis.

We are uploading the data and checkpoints. Due to bandwidth limitations, this will take some time. Stay tuned!

Overview

We introduce OS-Genesis, an interaction-driven pipeline for synthesizing high-quality and diverse GUI agent trajectory data without human supervision or predefined tasks. By leveraging reverse task synthesis and a trajectory reward model, OS-Genesis enables effective end2end training of GUI agents.

overview

Training

For details and operations of the training, please refer to the InternVL2 documentation and Qwen2-VL.

Evaluation

AndroidControl

To evaluate the AndroidControl Benchmark, please follow the steps below:

  1. Clone the GitHub Repository:

    git clone https://github.com/OS-Copilot/OS-Genesis.git
    
  2. Inference:

    cd OS-Genesis/evaluation
    bash run_ac_inference.sh
    
  3. Evaluation:

    pyhton ac_eval.py
    

Mobile

Model Name Base Model Training Data HF Link
OS-Genesis-4B-AC InternVL2-4B OS-Genesis-ac-training-data ๐Ÿค— link
OS-Genesis-7B-AC Qwen2-VL-7B-Instruct OS-Genesis-ac-training-data ๐Ÿค— link
OS-Genesis-8B-AC InternVL2-8B OS-Genesis-ac-training-data ๐Ÿค— link

Web

(Coming Soon)

Citation ๐Ÿ“–

๐Ÿซถ If you are interested in our work or find this repository / our data helpful, please consider using the following citation format when referencing our paper:

@article{sun2024genesis,
  title={OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis},
  author={Sun, Qiushi and Cheng, Kanzhi and Ding, Zichen and Jin, Chuanyang and Wang, Yian and Xu, Fangzhi and Wu, Zhenyu and Jia, Chengyou and Chen, Liheng and Liu, Zhoumianze and others},
  journal={arXiv preprint arXiv:2412.19723},
  year={2024}
}

About

Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

https://qiushisun.github.io/OS-Genesis-Home/


Languages

Language:Python 96.4%Language:Shell 3.6%