tianyi-lab / CoSTAR

Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

Repository from Github https://github.comtianyi-lab/CoSTARRepository from Github https://github.comtianyi-lab/CoSTAR

CoSTA*: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

πŸ“Œ This repository is under construction. Some subtasks/tools are not fully supported yet.

πŸ”— ArXiv Preprint


Introduction

CoSTA* is a cost-sensitive toolpath agent designed to solve multi-turn image editing tasks efficiently. It integrates Large Language Models (LLMs) and graph search algorithms to dynamically select AI tools while balancing cost and quality. Unlike traditional text-to-image models (e.g., Stable Diffusion, DALLE-3), which struggle with complex image editing workflows, CoSTA* constructs an optimal toolpath using an LLM-guided hierarchical planning strategy and an A* search-based selection process.

Pipeline

This repository provides:

  • The official codebase for CoSTA*.
  • Scripts to generate and optimize toolpaths for multi-turn image editing.

Live Demo

Try out CoSTA* online: Live Demo


Dataset

We provide a benchmark dataset with 121 images for testing CoSTA*, containing image-only and text+image tasks.

πŸ“‚ Dataset: Huggingface Dataset


Features

βœ… Hierarchical Planning – Uses LLMs to decompose a task into a subtask tree which is used for constructing the final Tool Subgraph.
βœ… Optimized Tool Selection – A* search is applied on the Tool Subgraph for cost-efficient, high-quality pathfinding.
βœ… Multimodal Support – Switches between text and image modalities for enhanced editing.
βœ… Quality Evaluation via VLM – Automatically assesses tool outputs to estimate the actual quality before progressing further.
βœ… Adaptive Retry Mechanism – If the output doesn’t meet the quality threshold, it is retried with updated hyperparameters.
βœ… Balancing Cost vs. Quality – A* search does not just minimize cost but also optimizes quality, allowing users to adjust Ξ± (alpha) to control cost vs. quality trade-off.
βœ… Supports 24 AI Tools – Integrates YOLO, GroundingDINO, Stable Diffusion, CLIP, SAM, DALL-E, and more.


Installation

1. Clone the Repository

git clone https://github.com/tianyi-lab/CoSTAR.git  
cd CoSTAR  

2. Install Dependencies

Ensure you have Python 3.8+ and install dependencies (most other dependencies are auto-installed when models are run):

pip install -r requirements.txt  

3. Download Pre-trained Checkpoints

The required pre-trained model checkpoints must be downloaded from Google Drive and placed in the checkpoints/ folder. The link to download the checkpoints is provided in checkpoints/checkpoints.txt.


Usage

Note: The API keys for OpenAI and StabilityAI need to be set in the run.py file before executing. To execute CoSTA*, run:

python run.py --image path/to/image.png --prompt "Edit this image" --output output.json --output_image final.png --alpha 0  

Example:

python run.py --image inputs/sample.jpg --prompt "Replace the cat with a dog and expand the image" --output Tree.json --output_image final_output.png --alpha 0
  • --image: Path to input image.
  • --prompt: Instruction for editing.
  • --output: Path to save generated subtask tree.
  • --output_image: Path to save the final output.
  • --alpha: Cost-quality trade-off parameter.

Running Individual Components

The main functions in the following scripts need to be uncommented, and the paths, hyperparameters, and API keys must be modified before execution.

1. Generate a Subtask Tree

Modify subtask_tree.py by providing the input image path and prompt, then run:

python subtask_tree.py  

2. Build a Tool Subgraph

Modify tool_subgraph.py to use the generated Tree.json, then execute:

python tool_subgraph.py  

3. Run A* Search for Optimal Toolpath

Modify astar_search.py with updated paths and hyperparameters, then run:

python astar_search.py  

4. Visualize the Process

A step-by-step live example can be found in Demo.ipynb, which provides an interactive Jupyter Notebook for understanding the workflow.


Directory Structure

CoSTAR/  
β”œβ”€β”€ checkpoints/         
β”‚   β”œβ”€β”€ checkpoints.txt  
β”œβ”€β”€ configs/             
β”‚   β”œβ”€β”€ tools.yaml       
β”œβ”€β”€ inputs/             
β”‚   β”œβ”€β”€ 40.jpeg         
β”œβ”€β”€ outputs/            
β”‚   β”œβ”€β”€ final.png       
β”œβ”€β”€ prompts/           
β”‚   β”œβ”€β”€ 40.txt          
β”œβ”€β”€ requirements/       
β”‚   β”œβ”€β”€ craft.txt       
β”‚   β”œβ”€β”€ deblurgan.txt   
β”‚   β”œβ”€β”€ easyocr.txt     
β”‚   β”œβ”€β”€ google_cloud.txt
β”‚   β”œβ”€β”€ groundingdino.txt
β”‚   β”œβ”€β”€ magicbrush.txt  
β”‚   β”œβ”€β”€ realesrgan.txt  
β”‚   β”œβ”€β”€ sam.txt         
β”‚   β”œβ”€β”€ stability.txt   
β”‚   β”œβ”€β”€ yolo.txt        
β”œβ”€β”€ results/           
β”‚   β”œβ”€β”€ final.png       
β”‚   β”œβ”€β”€ img1.png        
β”‚   β”œβ”€β”€ img2.png        
β”‚   β”œβ”€β”€ img3.png        
β”‚   β”œβ”€β”€ img4.png        
β”‚   β”œβ”€β”€ img5.png        
β”œβ”€β”€ tools/              
β”‚   β”œβ”€β”€ dalleimage.py  
β”‚   β”œβ”€β”€ groundingdino.py  
β”‚   β”œβ”€β”€ sam.py  
β”‚   β”œβ”€β”€ stabilityoutpaint.py  
β”‚   β”œβ”€β”€ yolov7.py  
β”‚   └── ...  
β”œβ”€β”€ .gitignore          
β”œβ”€β”€ LICENSE           
β”œβ”€β”€ README.md       
β”œβ”€β”€ Demo.ipynb       
β”œβ”€β”€ run.py             
β”œβ”€β”€ subtask_tree.py   
β”œβ”€β”€ tool_subgraph.py  
β”œβ”€β”€ astar_search.py    

Citation

If you find this work useful, please cite our paper:

@misc{gupta2025costaastcostsensitivetoolpathagent,
      title={CoSTA$\ast$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing}, 
      author={Advait Gupta and NandaKiran Velaga and Dang Nguyen and Tianyi Zhou},
      year={2025},
      eprint={2503.10613},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.10613}, 
}

About

Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Jupyter Notebook 96.9%Language:Python 3.1%