SD-explorer

result.mp4

foreword

this is a toy project to run the Stable Diffusion model locally. if you're after something more solid, I'd suggest you use WebUI instead.

as of now, this is essentially a bridge between NodeJS and the huggingface diffusers. you'll find all the background information and interactive exampless on the huggingface diffusers colab.

pre-requisite :

first and foremost: a Nvidia GPU with more than 10Go+ memory RTX 20XX+ (I think...) if you don't have this, no need to install this repo
up to date GPU drivers and the matching CUDA + cuDNN drivers
the model is handled by the diffusers lib, follow the instructions and requirements here
NB the diffuser lib requires user to login to huggingface and accept the license terms of the Stable diffusion model (see the model card)
Python 3.X (I use 3.9.5 on Windows) + PIP
NodeJS and NPM
a fair amount of space (~10Go+) to store all the models

in other words, if you managed to run the huggingface diffusers locally, from a CLI, you should be good to go. if not, I can't really help as the setup varies immensely between machines, OS, CUDA and Python version.

upscaling:

the upscaling option will require you to install RealESRGAN and CFPGAN (for face enhancement)

pip install realesrgan==0.2.5.0
pip install gfpgan==0.2.4

feel free to install more models from the REALESRGAN model zoo, make sure to save them into the python/models/ folder.

depth inference

if you want to compute the depth of an image (to perform the inpainting), you 'll need to install MidAs

image to prompts

the clip-interrogator let's you drag drop image onto the prompt field and get a prompt to generate similar images. it requires CLIP and BLIP, you should also download this model to "python/models/model__base_caption.pth" as the asterix (*) prevents it from downloading automatically on Windows...

warning:

I removed the NSFW filter as it sometimes returns empty images. this means that if you provide🔞prompts, the model will deliver🔞contents. consider yourself warned.

run!

install the npm dependencies with npm install

the JS app is located in src/, you can (must) build it with rollup --config rollup.config.js

run the server with npm start or npm run sd or node server.js

open a web browser and got to http://localhost:8080/

to test on another device (mobile, tablet), set your LAN IP in the server.js file.

overview

the concept is to generate images using using text written in natural language: the prompts.

you can write them inside the textareas or press the randomize button below to use the built in prompts list from Prompt Parrot! 🦜 to quickly check various styles and configs.

the app works with a single 2D canvas and the draggable red rectangle is the "region" that will be processed by Python. this allows for interesting and intuitive edition capabilities: start with an inference, then inptaint a larger area, then use image to image on portions of the resulting composition, rinse and repeat.

inference

the inference tab creates an image "from scratch" given a number of steps (think of it as a level of detail) and a guidance rate (think of it as the 'fidelity' to the prompt). higher steps count takes longer but provides more qualitative images, lower steps count computes faster but are usually blurry as f. guidance doesn't impact the computation time, lower guidance rate will take more risks (same goes for the img2img) by default the seed is set to -1 which means that for each call, the generator will be randomize, if you set it to a given number, the model will always produce the same image unless you change on of the parameters.

image to image

the img2img tab crops the rectangle from the main canvas and drives it towards the prompt at a given strength (guidance and seed work as above). that's where the undo button shines, it is very hard to tune...

inpainting

the inpainting tab works a bit like the image to image. the difference is that you need to draw a mask in the region. the mask is sent alongt the source image and the model will only drive the painted area towards the prompt. there are 3 extra params on this tab: size, the brush size, softness, the strength of the brush gradient and alpha, the opacity of the stroke. this is a rather incredible feature when you get used to it (addictive too).

region size

it is recommended to keep 512 in width or height, smaller sizes (say 256256) won't give good results (they'll be oversaturated). I limited the rect to 512512 as it may become impossible for Node to receive Blobs bigger than that.

feedbacks welcome :)

UI

all the functionalities split across different panels single GUI, the parameters vary depending on what you want to do.

resources

image to prompt
- GitHub - pharmapsychotic/clip-interrogator
- GitHub - salesforce/BLIP: PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
prompting
- https://lexica.art/ let's you explore visually the effect of certain keywords
- https://github.com/JD-P/simulacra-aesthetic-captions prompts database

nicoptere / SD-explorer