π This repository is under construction. Some subtasks/tools are not fully supported yet.
π ArXiv Preprint
CoSTA* is a cost-sensitive toolpath agent designed to solve multi-turn image editing tasks efficiently. It integrates Large Language Models (LLMs) and graph search algorithms to dynamically select AI tools while balancing cost and quality. Unlike traditional text-to-image models (e.g., Stable Diffusion, DALLE-3), which struggle with complex image editing workflows, CoSTA* constructs an optimal toolpath using an LLM-guided hierarchical planning strategy and an A* search-based selection process.
This repository provides:
- The official codebase for CoSTA*.
- Scripts to generate and optimize toolpaths for multi-turn image editing.
Try out CoSTA* online: Live Demo
We provide a benchmark dataset with 121 images for testing CoSTA*, containing image-only and text+image tasks.
π Dataset: Huggingface Dataset
β
Hierarchical Planning β Uses LLMs to decompose a task into a subtask tree which is used for constructing the final Tool Subgraph.
β
Optimized Tool Selection β A* search is applied on the Tool Subgraph for cost-efficient, high-quality pathfinding.
β
Multimodal Support β Switches between text and image modalities for enhanced editing.
β
Quality Evaluation via VLM β Automatically assesses tool outputs to estimate the actual quality before progressing further.
β
Adaptive Retry Mechanism β If the output doesnβt meet the quality threshold, it is retried with updated hyperparameters.
β
Balancing Cost vs. Quality β A* search does not just minimize cost but also optimizes quality, allowing users to adjust Ξ± (alpha) to control cost vs. quality trade-off.
β
Supports 24 AI Tools β Integrates YOLO, GroundingDINO, Stable Diffusion, CLIP, SAM, DALL-E, and more.
git clone https://github.com/tianyi-lab/CoSTAR.git
cd CoSTAR Ensure you have Python 3.8+ and install dependencies (most other dependencies are auto-installed when models are run):
pip install -r requirements.txt The required pre-trained model checkpoints must be downloaded from Google Drive and placed in the checkpoints/ folder. The link to download the checkpoints is provided in checkpoints/checkpoints.txt.
Note: The API keys for OpenAI and StabilityAI need to be set in the run.py file before executing. To execute CoSTA*, run:
python run.py --image path/to/image.png --prompt "Edit this image" --output output.json --output_image final.png --alpha 0 Example:
python run.py --image inputs/sample.jpg --prompt "Replace the cat with a dog and expand the image" --output Tree.json --output_image final_output.png --alpha 0--image: Path to input image.--prompt: Instruction for editing.--output: Path to save generated subtask tree.--output_image: Path to save the final output.--alpha: Cost-quality trade-off parameter.
The main functions in the following scripts need to be uncommented, and the paths, hyperparameters, and API keys must be modified before execution.
Modify subtask_tree.py by providing the input image path and prompt, then run:
python subtask_tree.py Modify tool_subgraph.py to use the generated Tree.json, then execute:
python tool_subgraph.py Modify astar_search.py with updated paths and hyperparameters, then run:
python astar_search.py A step-by-step live example can be found in Demo.ipynb, which provides an interactive Jupyter Notebook for understanding the workflow.
CoSTAR/
βββ checkpoints/
β βββ checkpoints.txt
βββ configs/
β βββ tools.yaml
βββ inputs/
β βββ 40.jpeg
βββ outputs/
β βββ final.png
βββ prompts/
β βββ 40.txt
βββ requirements/
β βββ craft.txt
β βββ deblurgan.txt
β βββ easyocr.txt
β βββ google_cloud.txt
β βββ groundingdino.txt
β βββ magicbrush.txt
β βββ realesrgan.txt
β βββ sam.txt
β βββ stability.txt
β βββ yolo.txt
βββ results/
β βββ final.png
β βββ img1.png
β βββ img2.png
β βββ img3.png
β βββ img4.png
β βββ img5.png
βββ tools/
β βββ dalleimage.py
β βββ groundingdino.py
β βββ sam.py
β βββ stabilityoutpaint.py
β βββ yolov7.py
β βββ ...
βββ .gitignore
βββ LICENSE
βββ README.md
βββ Demo.ipynb
βββ run.py
βββ subtask_tree.py
βββ tool_subgraph.py
βββ astar_search.py If you find this work useful, please cite our paper:
@misc{gupta2025costaastcostsensitivetoolpathagent,
title={CoSTA$\ast$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing},
author={Advait Gupta and NandaKiran Velaga and Dang Nguyen and Tianyi Zhou},
year={2025},
eprint={2503.10613},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.10613},
}