MegaEdit
A collection of works on inversion and diffusion image editing via feature/attention injection.
NOTE: this is not compatible with Xformers, but it does support sliced attention if you are experiencing memory issues
This repo was originally based off of prompt2prompt but contains a number of improvements and implementations of other papers + some of my own stuff
This includes:
- injection of convolution features (https://arxiv.org/abs/2211.12572)
- originally used EDICT for inversion but now the args passed make it just do standard DDIM inversion (https://github.com/salesforce/EDICT)
My own addons include:
- injecting an interpolation of original and proposed features, which is on a schedule. This allows us to have influence from the original features much further into the generation without fully taking over the generation. This gradual approach may confer similar benefits to (https://github.com/pix2pixzero/pix2pix-zero)
- split guidance scale. This allows to do inversion without classifier free guidance for stability, but do editing at a different guidance scale
- Gaussian Smoothed attention. the original intention behind this was to allow attention to cover more ground before amplifying it. Instead, I am noticing less erratic details and less of a photobashed look. See the examples below.
- (WIP) An attempt at gradient-free attend and excite by locally amplifying attention in a region of the image. This isn't optimal as original method optimizes latents, but hope that giving special care to certain tokens can help give a simiilar effect without adding too much time/VRAM
- Some other QoL improvements for easy deployment and demystifying some of the parameters
Usage:
- set up torch environment of choice
- git clone this repo
- pip install -r requirements.txt
- run the notebook!
Other editing examples
Usefulness of attention reweighing, an alternative to how automatic1111 does it which is at the text encoder level, and better solution for when SD isn't listening to your prompt.