MegaEdit

A collection of works on inversion and diffusion image editing via feature/attention injection.

NOTE: this is not compatible with Xformers, but it does support sliced attention if you are experiencing memory issues

This repo was originally based off of prompt2prompt but contains a number of improvements and implementations of other papers + some of my own stuff

This includes:

injection of convolution features (https://arxiv.org/abs/2211.12572)
originally used EDICT for inversion but now the args passed make it just do standard DDIM inversion (https://github.com/salesforce/EDICT)

My own addons include:

injecting an interpolation of original and proposed features, which is on a schedule. This allows us to have influence from the original features much further into the generation without fully taking over the generation. This gradual approach may confer similar benefits to (https://github.com/pix2pixzero/pix2pix-zero)
split guidance scale. This allows to do inversion without classifier free guidance for stability, but do editing at a different guidance scale
Gaussian Smoothed attention. the original intention behind this was to allow attention to cover more ground before amplifying it. Instead, I am noticing less erratic details and less of a photobashed look. See the examples below.
(WIP) An attempt at gradient-free attend and excite by locally amplifying attention in a region of the image. This isn't optimal as original method optimizes latents, but hope that giving special care to certain tokens can help give a simiilar effect without adding too much time/VRAM
Some other QoL improvements for easy deployment and demystifying some of the parameters

Usage:

set up torch environment of choice
git clone this repo
pip install -r requirements.txt
run the notebook!

Smoothing example:

Other editing examples

Usefulness of attention reweighing, an alternative to how automatic1111 does it which is at the text encoder level, and better solution for when SD isn't listening to your prompt.

yoavhacohen / MegaEdit

MegaEdit

About

Languages