txt2imghd
txt2imghd is a port of the GOBIG mode from progrockdiffusion applied to Stable Diffusion, with Real-ESRGAN as the upscaler. It creates detailed, higher-resolution images by first generating an image from a prompt, upscaling it, and then running img2img on smaller pieces of the upscaled image, and blending the result back into the original image.
txt2imghd with default settings has the same VRAM requirements as regular Stable Diffusion, although generation of the detailed images will take longer.
Installation
- Have a working repository of Stable Diffusion
- Copy
txt2imghd.py
intoscripts/
- Download the appropriate release of Real-ESRGAN (the respective
realesrgan-ncnn-vulkan
.zip for your OS) and unzip it into the root of your Stable Diffusion repository
Running
txt2imghd has most of the same parameters as txt2img. ddim_steps
has been renamed to steps
. The strength
parameter controls how much detailing to do (between 0.0-1.0). If no prompt
is given on the command line, the program will ask for it as input.
python scripts/txt2imghd.py
txt2imghd will output three images: the original Stable Diffusion image, the upscaled version (denoted by a u
suffix), and the detailed version (denoted by the ud
suffix).
If you're running into issues with WatermarkEncoder, install WatermarkEncoder in your ldm environment with
pip install invisible-watermark
Optional Parameters
A selection of useful parameters to be appended after python scripts/txt2imghd.py
:
--prompt
the prompt to render (in quotes), examples below
--img
only do detailing, using the path to an existing image (image will also be copied to output dir)
--generated
only do detailing, on a an image in the output folder, using the image's index (example "00003")
--n_iter 25
number of images to generate
default = 1
--gobig_overlap
overlap size for GOBIG
default = 128
--detail_steps
number of sampling steps when detailing
default = 150
--wm
watermark text using WatermarkEncoder
default = "txt2imghd"
--passes
number of upscaling/detailing passes
default = 1
--strength
strength for noising/unnoising. 1.0 corresponds to full destruction of information in init image (especially useful when using an existing image)
default = 0.3