EachSheep / ShortcutsBench

ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents

Home Page:https://github.com/EachSheep/ShortcutsBench

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🔧ShortcutsBench📱

Dialogues Dialogues

Read this in 中文.

What are Shortcuts?

Shortcuts are workflows🔄 created by developers in the Shortcuts app using a user-friendly graphical interface🖼️. According to Apple, they are "a quick way to get one or more tasks done with your apps."📱✨

How can this project help you?

At Apple's WWDC'24, many AI features were introduced to Apple devices🤖. We're very interested in how Apple integrates large language models, such as ChatGPT, with devices to provide a smarter user experience📱💡. In this process, shortcuts will certainly play a crucial role!🚀

  • As a Shortcuts user📱:

    • You can find your favorite shortcuts in this dataset📱.
    • You can integrate more shortcuts into your Apple devices to have Siri handle complex tasks🗣️.
    • ......
  • As a Shortcuts enthusiast💡:

    • You can use the vast number of shortcut links (and corresponding source files) in this dataset to study how to write shortcuts and customize your workflows💡.
    • You can contribute more shortcuts to this project📤.
    • ......
  • As a researcher🔬:

    • Study the construction of automated workflows: Shortcuts are essentially workflows composed of a series of API calls (actions) provided by Apple and third-party apps🔍.
    • Study low-code programming: Shortcuts include code features such as branching, looping, and variable assignment while having a user-friendly graphical interface🖥️.
    • Study API-based agents: Allow large language models to autonomously decide if, when, and how to use APIs based on user queries (tasks)🔧.
    • Study how to fine-tune large language models with shortcuts to closely integrate language models with phones, computers, and smartwatches, realizing the vision of an "LLM-based operating system"📈.
    • ......

🌟Advantages🌟

  • Data Scale and Quality: Our dataset covers 88 apps and 1414 types of APIs. All data is collected from real shortcut collection stations.
  • User Queries:
    • Diversity of Difficulty: Constructing a dataset with diverse APIs and user queries helps to distinguish the capabilities of different agents. Our dataset includes a variety of user queries, totaling 7628 queries, with an average of 7.86 APIs per query and an average action sequence length of 21.46.
    • Reflecting Real User Needs: We use natural language workflow descriptions of shortcuts to construct prompts, inputting them into large language models to generate accurate queries.
    • Comprehensive Content: Our user queries include the basic data types required for API calls, which helps to comprehensively evaluate the capabilities of the agents.
  • Multi-dimensional Evaluation Angles
    • Accuracy of Parameter Filling: Effective and accurate parameter selection is crucial for task completion. Our user queries include the basic data types required for API calls, allowing for the evaluation of the agent's parameter filling ability.
    • Ability to Ask for Missing Information: Agents should be able to ask the system or user for missing information from the queries.
    • Accuracy of API Selection: API selection is the most fundamental decision-making ability of an agent. Example Image

If you find this project helpful, please give us a star⭐️! Thank you for your support!🙏

Keywords: Shortcuts, Apple, WWDC'24, Siri, iOS, macOS, watchOS, Workflow, API Call, Low-Code Programming, Intelligent Agent, Large Language Model

What can Shortcuts do for you?

Shortcuts can help you complete various complex tasks with one click! For example:

Want more?✨

Check out the shortcuts we collected in this project 📂.

Project Task List (Continuously Updating)📋

  • Shortcuts Dataset: Includes shortcut metadata (title, description, source, etc.), iCloud links, and shortcut source files.
  • APIs involved in shortcuts: Including API metadata (function description, name, parameter names, parameter types, parameter default values, return value names, etc.) and the app itself📱.
  • How do shortcuts promote the development of intelligent agents? Stay tuned for our upcoming work!🚀

User Guide for Shortcuts Users📱

Search for the Shortcuts You Want🔍

Wondering where our shortcuts are? How to search for the shortcuts you need in this project? Follow these steps:

  1. In this repository, dataset/${website name}/${category name}/README.md files record the metadata of all shortcuts in that category, including name, description, iCloud download link, etc. Each README.md file is structured as follows:
    ### Name: Wine Shops # Shortcut name
    - URL: https://www.icloud.com/shortcuts/78ffd18288fd4da286bfd570993ea46e # Shortcut iCloud link
    - Source: https://shortcutsgallery.com # Shortcut source store
    - Description: Look for Wine shop near by you # Shortcut function description
  2. Use Ctrl+F to search directly in the browser based on shortcut name keywords🔎.

You can also visit Shortcut Collection Sites to search for the shortcuts you want🌐.

How to Import the Shortcuts You Found📥

On an Apple device, clicking the iCloud link in the URL will automatically open and import the shortcut into your Shortcuts app📲.

User Guide for Developers and Researchers📚

Obtain the Dataset

You can download shortcuts one by one from the iCloud links in the User Guide or get the complete data from the following links:

Data Sources and Links 🌐

Data Source Metadata Location Cloud Drive Link
Matthewcassinelli Location in our Repo Google Cloud | Baidu Netdisk
Routinehub Location in our Repo Google Cloud | Baidu Netdisk
MacStories Location in our Repo Google Cloud | Baidu Netdisk
ShareShortcuts Location in our Repo Google Cloud | Baidu Netdisk
ShortcutsGallery Location in our Repo Google Cloud | Baidu Netdisk
iSpazio Location in our Repo Google Cloud | Baidu Netdisk
Jiejingku Location in our Repo Google Cloud | Baidu Netdisk
Sspai Location in our Repo Google Cloud | Baidu Netdisk
Jiejing.Fun Location in our Repo Google Cloud | Baidu Netdisk
kejicut Location in our Repo Google Cloud | Baidu Netdisk
rcuts Location in our Repo Google Cloud | Baidu Netdisk

Source File Structure of Shortcuts

The source data of shortcuts in the cloud disk is organized in the following directory structure:

dataset/
├── matthewcassinelli.com_sirishortcuts_library_free # Website name
│   ├── file1
│   ├── file2
│   └── file3

or

dataset/
├── jiejingku.net # Website name
│   ├── category1 # Category 
│   │   ├── file1 # Each specific shortcut
│   │   └── file2
│   ├── category2
│   │   └── file3

Each file represents a shortcut. The file name is generated from the shortcut name after simple processing, with the following code:

file_name = re.sub(r'[^a-zA-Z0-9]', '_', name)

The shortcut source files we provide are in JSON format. Shortcuts exported from Apple devices are either iCloud links (shared as links) or encrypted shortcut files (with the .shortcut suffix).

If you wish to import a shortcut source file into the Shortcuts app, please follow these steps on macOS:

  • Convert the JSON file format to PLIST file format 📑.
  • Sign the PLIST file 🔏.
  • Import the signed file into the Shortcuts app 📲.

License Statement 📜

All code and datasets in this project are licensed under the Apache License 2.0. This means you are free to use, copy, modify, and distribute the contents of this project, but must comply with the following conditions:

  • Copyright Notice: The original copyright notice and license statement must be retained in all copies of the project.
  • State Changes: If you modify the code, you must indicate the changes made in any modified files.
  • Trademark Use: This license does not grant the right to use trademarks, service marks, or trade names of the project.

For the full license text, see LICENSE.

Additionally, you must comply with the license agreements of the data sources from each shortcut sharing site.

About

ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents

https://github.com/EachSheep/ShortcutsBench

License:Apache License 2.0


Languages

Language:Python 100.0%