dino foundation-models grounding-dino image-editing sam segment-anything stable-diffusion

Text-Based-Image-Editing

Leveraging three computer vision foundation models, Segment Anything Model (SAM), Stable Diffusion, and Grounding DINO, to edit and manipulate images. Starting by leveraging Grounding DINO for zero-shot object detection driven by textual input. Then, using SAM, masks are extracted from the identified bounding boxes. These masks guide Stable Diffusion to replace the masked areas with contextually appropriate content derived from the text prompt, resulting in a cohesive text-based image editing process.

Install Requirements

First, install requirements and Grounding DINO

pip install -r requirements.txt

Grounding DINO

git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install -e .

Run download_files.py to download pre-trained models

python download_files.py

Run & Usage

Run main.py by the following script

python main.py --img_path="input image" --selected_object="your selected object" --prompt="your prompt" --output_path="output path"

Example

Example 1

Example 2

More Details

Grounding DINO

Segment Anything Model

Stable Diffusion

About

Combining three computer vision foundation models, Segment Anything Model (SAM), Stable Diffusion, and Grounding DINO, to edit and manipulate images.

dino foundation-models grounding-dino image-editing sam segment-anything stable-diffusion

Languages

Language:Python 100.0%