scheshmi / Text-Based-Image-Editing

Combining three computer vision foundation models, Segment Anything Model (SAM), Stable Diffusion, and Grounding DINO, to edit and manipulate images.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text-Based-Image-Editing

Leveraging three computer vision foundation models, Segment Anything Model (SAM), Stable Diffusion, and Grounding DINO, to edit and manipulate images. Starting by leveraging Grounding DINO for zero-shot object detection driven by textual input. Then, using SAM, masks are extracted from the identified bounding boxes. These masks guide Stable Diffusion to replace the masked areas with contextually appropriate content derived from the text prompt, resulting in a cohesive text-based image editing process.

Install Requirements

First, install requirements and Grounding DINO

pip install -r requirements.txt

Grounding DINO

git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install -e .

Run download_files.py to download pre-trained models

python download_files.py

Run & Usage

Run main.py by the following script

python main.py --img_path="input image" --selected_object="your selected object" --prompt="your prompt" --output_path="output path"

Example

Example 1

cars.png

Example 2

flowers.png

More Details

Grounding DINO

Segment Anything Model

Stable Diffusion

About

Combining three computer vision foundation models, Segment Anything Model (SAM), Stable Diffusion, and Grounding DINO, to edit and manipulate images.


Languages

Language:Python 100.0%