A daemon that makes a desktop OS accessible to AI agents.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
AgentD
makes a desktop OS accessible to AI agents by exposing an HTTP API.
For a higher level interface see AgentDesk.
AgentD
is currently tested on Ubuntu 22.04 cloud image.
We recommend using one of our base vms which is already configured.
For Qemu, download the qcow2 image:
wget https://storage.googleapis.com/agentsea-vms/jammy/latest/agentd-jammy.qcow2
To use the image, we need to make a cloud-init iso with our user-data. See this tutorial, below is how it looks on MacOS:
xorriso -as mkisofs -o cidata.iso -V "cidata" -J -r -iso-level 3 meta/
Then the image can be ran with Qemu:
qemu-system-x86_64 -nographic -hda ./agentd-jammy.qcow2 \
-m 4G -smp 2 -netdev user,id=vmnet,hostfwd=tcp::6080-:6080,hostfwd=tcp::8000-:8000,hostfwd=tcp::2222-:22 \
-device e1000,netdev=vmnet -cdrom cidata.iso
Once running, the agentd service can be accessed:
curl localhost:8000/health
To login to the machine:
ssh -p 2222 agentsea@localhost
For AWS, use public AMI ami-01a893c1530453073
.
Create a cloud-init script with your ssh key:
#cloud-config
users:
- name: agentsea
sudo: ['ALL=(ALL) NOPASSWD:ALL']
groups: sudo
ssh_authorized_keys:
- your-ssh-public-key
package_upgrade: true
aws ec2 run-instances \
--image-id ami-01a893c1530453073 \
--count 1 \
--instance-type t2.micro \
--key-name $KEY_NAME \
--security-group-ids $SG_NAME \
--subnet-id $SUBNET_NAME \
--user-data file://path/to/cloud-init-config.yaml
For GCE, use the public image ubuntu-22-04-20240208044623
.
gcloud compute instances create $NAME \
--machine-type "n1-standard-1" \
--image "ubuntu-22-04-20240208044623" \
--image-project $PROJECT_ID \
--zone $ZONE \
--metadata ssh-keys="agentsea:$(cat path/to/your/public/ssh/key.pub)"
If you want to install on a fresh Ubuntu VM, use the a cloud images base qcow2 image.
curl -sSL https://raw.githubusercontent.com/agentsea/agentd/main/remote_install.sh | sudo bash
- GET /health - Checks the API's health.
- Response:
{"status": "ok"}
- Response:
-
GET /mouse_coordinates - Retrieves the current mouse coordinates.
- Response Model:
CoordinatesModel
- Response Model:
-
POST /move_mouse - Moves the mouse to specified coordinates.
- Request Body:
MoveMouseModel
- Response:
{"status": "success"}
or{"status": "error", "message": "<error_message>"}
- Request Body:
-
POST /click - Clicks at the current or specified location.
- Request Body:
ClickModel
- Response:
{"status": "success"}
or raisesHTTPException
- Request Body:
-
POST /double_click - Performs a double-click at the current mouse location.
- Response:
{"status": "success"}
or raisesHTTPException
- Response:
-
POST /type_text - Types the specified text.
- Request Body:
TypeTextModel
- Response:
{"status": "success"}
or raisesHTTPException
- Request Body:
-
POST /press_key - Presses a specified key.
- Request Body:
PressKeyModel
- Response:
{"status": "success"}
or raisesHTTPException
- Request Body:
-
POST /scroll - Scrolls the mouse wheel.
- Request Body:
ScrollModel
- Response:
{"status": "success"}
or raisesHTTPException
- Request Body:
-
POST /drag_mouse - Drags the mouse to specified coordinates.
- Request Body:
DragMouseModel
- Response:
{"status": "success"}
or raisesHTTPException
- Request Body:
- POST /open_url - Opens a URL in a Chromium-based browser.
- Request Body:
OpenURLModel
- Response:
{"status": "success"}
or{"status": "error", "message": "<error_message>"}
- Request Body:
- POST /screenshot - Takes a screenshot and returns it as a base64-encoded image.
- Response Model:
ScreenshotResponseModel
- Response Model:
-
POST /recordings - Starts a new recording session.
- Request Body:
RecordRequest
- Response Model:
RecordResponse
- Request Body:
-
GET /recordings - Lists all recordings.
- Response Model:
Recordings
- Response Model:
-
POST /recordings/{session_id}/stop - Stops a recording session.
- Path Variable:
session_id
- Response: None (side effect: stops recording and saves to file)
- Path Variable:
-
GET /recordings/{session_id} - Retrieves information about a specific recording session.
- Path Variable:
session_id
- Response Model:
Recording
- Path Variable:
-
GET /recordings/{session_id}/event/{event_id} - Retrieves a specific event from a recording.
- Path Variables:
session_id
,event_id
- Response Model:
RecordedEvent
- Path Variables:
-
DELETE /recordings/{session_id}/event/{event_id} - Deletes a specific event from a recording.
- Path Variables:
session_id
,event_id
- Response Model:
Recording
- Path Variables:
-
GET /active_sessions - Lists IDs of all active recording sessions.
- Response Model:
Recordings
- Response Model:
-
GET /recordings/{session_id}/actions - Retrieves all actions from a specific recording session.
- Path Variable:
session_id
- Response Model:
Actions
- Path Variable:
Come join us on Discord.
To pack a fresh set of images
make pack
To run from this repo
make run-jammy