crivetimihai / pywebagent

AI that turns website functionality into python APIs! Control websites through python and AI!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pywebagent - an experimental AI web agent

pywebagent
Discord Follow

GitHub Repo stars X (formerly Twitter) Follow X (formerly Twitter) Follow

πŸ“ Description

  • pywebagent is an experimental Python package designed to control websites by utilizing the capabilities of OpenAI's GPT-4 Vision.
  • It basically converts website capabilities to python functions!
  • With pywebagent, you can automate complex tasks on websites, like filling forms, buying products, and more.
  • It is especially useful for performing tasks that require multiple steps, such as buying a product online, booking a flight, etc.
  • It is really, really, really experimental (definitely lots of hacky code πŸ˜…)

🌟 Highlights

πŸ‘οΈ Web navigation using the UI
πŸ“ Form filling
πŸ’³ Buying products
πŸ“ Uploading Files

πŸ’» Installation

Ensure you have Python version 3.6 or later installed. Then run this command in the terminal:

pip install pywebagent

πŸš€ Usage

🟑 Note: OPENAI_API_KEY should be set as an environment variable. 🟑

Example 1: Order a plush bunny on Amazon

from pywebagent import act

# sometimes you'll need to help the agent bypass the captcha, sometimes it will succeed by itself
act(
    "https://amazon.com", 
    "Order a plush bunny", 
    email="<your amazon email>", 
    password="<your amazon password>"
)

Sped up recording:

amazon_5x_speed.MOV

Example 2: Order your photo prints from Mixtiles

Here is an example of how to use pywebagent to print some images on mixtiles.com:

from pywebagent import act

act(
    "https://mixtiles.com/",
    "Order these as Mixtiles",
    name="John Doe",
    email="johndoe208909@gmail.com",
    photos=[
        "demo/mixtiles/1.jpg",
        "demo/mixtiles/2.jpg",
        "demo/mixtiles/3.jpg"
    ],
    payment_info={
        "card_number": "4242424242424242",
        "expiry_date": "12/22",
        "cvc": "123"
    },
    address="123 Main St, San Francisco, CA 94105"
)

Sped up recording:

mixtiles_5x_speed.MOV

πŸ› οΈ How It Works

The concept is extremely simple. Detect all elements that have an event handler (which means they can be interacted with), highlight them, take a screenshot, and ask GPT 4 Vision what to do. The results are surprisingly good!

πŸ“… Upcoming Features

  • Allow act to return a result (e.g, order number of purchase, ...)
  • Add support for more types of interaction (including scrolling, swiping, ..)
  • Add caching to speed everything up
  • Support open source vision models
  • Support more complicated actions

🀝 Contributing

Contributions are more than welcome! In fact, we're looking for people who want to develop this further.
If you have any suggestions, features requests or want to report bugs, kindly open an issue first to discuss what you would like to change. For changes, please open a pull request.

🌐 Community

Feel free to join our discord at https://discord.gg/5eJkjMMa.

πŸ“œ License

pywebagent is licensed under the MIT License. See LICENSE for more information.

⚠️ Disclaimer

pywebagent is an experimental project, and is not officially affiliated with OpenAI.

πŸ“ž Contact

If you have specific questions about using the pywebagent, feel free to email us at maximkgn@gmail.com or ori.kabeli@gmail.com

About

AI that turns website functionality into python APIs! Control websites through python and AI!

License:MIT License


Languages

Language:JavaScript 57.9%Language:Python 42.1%