AdmondGuo/mental-diffusion

Mental Diffusion

Stable diffusion headless and web interface
Version 0.1.6 alpha
Changelog

Features

Requirements

At least 16 GB RAM
NVIDIA GPU with CUDA compute capability (at least 4 GB memory)

Headless

Usage:    headless.py --arg "value"
Example:  headless.py -p "prompt" -out base64
Upscale:  headless.py -sr 0.1 -up true -i "/path-to-image.png" -of true
Metadata: headless.py -meta "/path-to-image.png"
Batch:    headless.py -p "prompt" -st 25 -b 10 -w 1200 -h 400
CKPT:     headless.py -p "prompt" -ck "deliberate_v2"

--help               show this help message and exit
--get        -get    fetch data from server by record id (int)

--batch      -b      number of images to render (def: 1)
--checkpoint -ck     set checkpoint by file name, null = default checkpoint (def: null)
--scheduler  -sc     ddpm, ddim, pndm, lms, euler, euler_anc (def: euler_anc)
--prompt     -p      positive prompt input
--negative   -n      negative prompt input
--width      -w      image width have to be divisible by 8 (def: 512)
--height     -h      image height have to be divisible by 8 (def: 512)
--seed       -s      seed number, 0 to randomize (def: 0)
--steps      -st     steps from 10 to 50, 20-25 is good enough (def: 20)
--guidance   -g      guidance scale, how closely linked to the prompt (def: 7.5)
--strength   -sr     how much respect the final image should pay to the original (def: 0.5)
--image      -i      PNG file path or base64 PNG (def: '')
--mask       -m      PNG file path or base64 PNG (def: '')
--facefix    -ff     true/false, face restoration using gfpgan (def: false)
--upscale    -up     true/false, upscale using real-esrgan 4x (def: false)
--savefile   -sv     true/false, save image to PNG, contain metadata (def: true)
--onefile    -of     true/false, save the final result only (def: false)
--outpath    -o      /path-to-directory (def: ./.temp)
--filename   -f      filename prefix (.png extension is not required)

--metadata   -meta   /path-to-image.png, extract metadata from PNG
--out        -out    stdout 'metadata' or 'base64' (def: metadata)

* When the image is not empty, the pipeline switches to image-to-image
* When the image and mask are not empty, the pipeline switches to inpainting
* Check server.log for previous records

Incorrect and correct mask image

Web Interface

The web interface is a prototype with minimal bugs

Your data is safe and can be loaded again as long as "Autosave File" is checked

If you want your painting to combine with the image pixels, you need to bake the canvas

To create outpainting, set "Outpaint Padding" size, your initial image and mask will be generated for you (set Strength to 1.0)

Schedulers, using equal steps, steps is not enough for PNDM/LMS

GFPGAN was applied to the LMS rendering above

Painting bloods to guide AI with GFPGAN result

About 10 inpaint renders, top is the original

Outpaint examples, padding 128 and 256

Outpaint padding 128, use inpainting to cleanup errors

Mouse controls

Key	Action
Left Button	drag, draw, select
Middle Button	Zoom reset
Right Button	Pan canvas
Wheel	Zoom canvas in/out

Keyboard shortcuts

Key	Action
Space	Toggle metadata pool
D	Drag tool
B	Brush tool
L	Line tool
E	Eraser tool
M	Mask tool
]	Increase tool size
[	Decrease tool size
+	Increase tool opacity
-	Decrease tool opacity
CTRL + Enter	Render/Generate
CTRL + L	Load PNG metadata
CTRL + Z	Undo painting
CTRL + X	Redo painting

Installation

[ Automatic Installation ]

curl -o md-installer.py https://raw.githubusercontent.com/nimadez/mental-diffusion/main/installer/md-installer.py

[ Manual Installation ]

Download python-3.10.11-embed-amd64.zip

curl https://bootstrap.pypa.io/get-pip.py -k --ssl-no-revoke -o get-pip.py
python get-pip.py
python -m pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
python -m pip install accelerate==0.20.3
python -m pip install diffusers==0.18.2
python -m pip install transformers==4.30.2
python -m pip install omegaconf==2.3.0
python -m pip install safetensors==0.3.1
python -m pip install realesrgan==0.3.0
python -m pip install gfpgan==1.3.8
python -m pip install websockets==11.0.3

git clone https://github.com/nimadez/mental-diffusion.git
run.bat       -> start server (url: http://localhost:8011)
headless.py   -> use headless if you are familiar with consoles

* edit "config.json" to define model paths

[ Automatic One-Time Downloads ]

200 MB gfpgan weights (root directory)
1.7 GB openai/clip-vit-large-patch14 (huggingface cache)

To prevent re-downloading huggingface cache, add HF cache directory to your environment variables
> setx HF_HOME path-to-dir\.cache\huggingface

Models

vae-ft-mse-840000-ema-pruned.safetensors (optional - to "models/vae")
GFPGANv1.4.pth (required - to "models/gfpgan")
RealESRGAN_x4plus.pth (required - to "models/realesrgan")

All .ckpt checkpoints converted to .safetensors (security)
All checkpoints converted to fp16 (smaller size, use prune.py)
All inpainting checkpoints must have "inpainting" in their filename
VAE is optional but recommended for getting optimal results
Back to the future, SD v1.x only!

I do not officially support any models

Visit Civitai.com for more SD 1.5 checkpoints

Known Issues

Mental Diffusion is offline, if the internet access is interrupted,
if the connection is established, some data will be send and received
when loading the checkpoint. (huggingface tries to compare files)

FAQ

How to speed up rendering?
- Do not constantly update the checkpoint, let it be cached and reused
- Open NVIDIA Control Panel, enable "Adaptive" power management mode

Why does it give a connection error when loading the checkpoint?
Use VPN, enable "use_proxy" in config.json, or disable network
connection. (after you have disabled your network connection, you
should not set proxy to 1)

Is SDXL supported?
SDXL requires 12 GB of video memory, it is not currently supported.

History

0.1.5 -> back to the roots, major performance gain #1

- Mental-diffusion started with "sdkit" and later evolved into diffusers
- Created for my personal use

License

Code released under the MIT license.

AdmondGuo / mental-diffusion