agentsea / agentd

A daemon that makes a desktop OS accessible to AI agents

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


A daemon that makes a desktop OS accessible to AI agents.
Explore the docs »

View Demo · Report Bug · Request Feature

AgentD makes a desktop OS accessible to AI agents by exposing an HTTP API.

For a higher level interface see AgentDesk.


AgentD is currently tested on Ubuntu 22.04 cloud image.

We recommend using one of our base vms which is already configured.


For Qemu, download the qcow2 image:


To use the image, we need to make a cloud-init iso with our user-data. See this tutorial, below is how it looks on MacOS:

xorriso -as mkisofs -o cidata.iso -V "cidata" -J -r -iso-level 3 meta/

Then the image can be ran with Qemu:

qemu-system-x86_64 -nographic -hda ./agentd-jammy.qcow2 \
-m 4G -smp 2 -netdev user,id=vmnet,hostfwd=tcp::6080-:6080,hostfwd=tcp::8000-:8000,hostfwd=tcp::2222-:22 \
-device e1000,netdev=vmnet -cdrom cidata.iso

Once running, the agentd service can be accessed:

curl localhost:8000/health

To login to the machine:

ssh -p 2222 agentsea@localhost


For AWS, use public AMI ami-01a893c1530453073.

Create a cloud-init script with your ssh key:


  - name: agentsea
    sudo: ['ALL=(ALL) NOPASSWD:ALL']
    groups: sudo
      - your-ssh-public-key

package_upgrade: true
aws ec2 run-instances \
    --image-id ami-01a893c1530453073 \
    --count 1 \
    --instance-type t2.micro \
    --key-name $KEY_NAME \
    --security-group-ids $SG_NAME \
    --subnet-id $SUBNET_NAME \
    --user-data file://path/to/cloud-init-config.yaml


For GCE, use the public image ubuntu-22-04-20240208044623.

gcloud compute instances create $NAME \
    --machine-type "n1-standard-1" \
    --image "ubuntu-22-04-20240208044623" \
    --image-project $PROJECT_ID \
    --zone $ZONE \
    --metadata ssh-keys="agentsea:$(cat path/to/your/public/ssh/"


If you want to install on a fresh Ubuntu VM, use the a cloud images base qcow2 image.

curl -sSL | sudo bash

API Endpoints


  • GET /health - Checks the API's health.
    • Response: {"status": "ok"}

Mouse and Keyboard Control

  • GET /mouse_coordinates - Retrieves the current mouse coordinates.

    • Response Model: CoordinatesModel
  • POST /move_mouse - Moves the mouse to specified coordinates.

    • Request Body: MoveMouseModel
    • Response: {"status": "success"} or {"status": "error", "message": "<error_message>"}
  • POST /click - Clicks at the current or specified location.

    • Request Body: ClickModel
    • Response: {"status": "success"} or raises HTTPException
  • POST /double_click - Performs a double-click at the current mouse location.

    • Response: {"status": "success"} or raises HTTPException
  • POST /type_text - Types the specified text.

    • Request Body: TypeTextModel
    • Response: {"status": "success"} or raises HTTPException
  • POST /press_key - Presses a specified key.

    • Request Body: PressKeyModel
    • Response: {"status": "success"} or raises HTTPException
  • POST /scroll - Scrolls the mouse wheel.

    • Request Body: ScrollModel
    • Response: {"status": "success"} or raises HTTPException
  • POST /drag_mouse - Drags the mouse to specified coordinates.

    • Request Body: DragMouseModel
    • Response: {"status": "success"} or raises HTTPException

Web Browser Control

  • POST /open_url - Opens a URL in a Chromium-based browser.
    • Request Body: OpenURLModel
    • Response: {"status": "success"} or {"status": "error", "message": "<error_message>"}

Screen Capture

  • POST /screenshot - Takes a screenshot and returns it as a base64-encoded image.
    • Response Model: ScreenshotResponseModel

Session Recording

  • POST /recordings - Starts a new recording session.

    • Request Body: RecordRequest
    • Response Model: RecordResponse
  • GET /recordings - Lists all recordings.

    • Response Model: Recordings
  • POST /recordings/{session_id}/stop - Stops a recording session.

    • Path Variable: session_id
    • Response: None (side effect: stops recording and saves to file)
  • GET /recordings/{session_id} - Retrieves information about a specific recording session.

    • Path Variable: session_id
    • Response Model: Recording
  • GET /recordings/{session_id}/event/{event_id} - Retrieves a specific event from a recording.

    • Path Variables: session_id, event_id
    • Response Model: RecordedEvent
  • DELETE /recordings/{session_id}/event/{event_id} - Deletes a specific event from a recording.

    • Path Variables: session_id, event_id
    • Response Model: Recording
  • GET /active_sessions - Lists IDs of all active recording sessions.

    • Response Model: Recordings
  • GET /recordings/{session_id}/actions - Retrieves all actions from a specific recording session.

    • Path Variable: session_id
    • Response Model: Actions


Come join us on Discord.


To pack a fresh set of images

make pack

To run from this repo

make run-jammy


A daemon that makes a desktop OS accessible to AI agents

License:MIT License


Language:Python 62.4%Language:Shell 17.6%Language:HCL 16.8%Language:Makefile 2.7%Language:Smarty 0.6%