notes for prompt engineering
- images
- video
- img2img of famous movie scenes (lalaland)
- virtual fashion (karenxcheng)
- evolution of scenes (xander)
- outpainting https://twitter.com/orbamsterdam/status/1568200010747068417?s=21&t=rliacnWOIjJMiS37s8qCCw
- webUI img2img collaboration https://twitter.com/_akhaliq/status/1563582621757898752
- image to video with rotation https://twitter.com/TomLikesRobots/status/1571096804539912192
- "prompt paint" https://twitter.com/1littlecoder/status/1572573152974372864
- music videos video, colab
- direct text2video project
- text-to-3d https://twitter.com/_akhaliq/status/1575541930905243652
- gpt3 applications
- text to graphviz https://twitter.com/goodside/status/1561549768987496449?s=21&t=rliacnWOIjJMiS37s8qCCw
- suspedning to python for math
- https://www.gwern.net/GPT-3#prompts-as-programming
- beginner
- openAI prompt tutorial https://beta.openai.com/docs/quickstart/add-some-examples
- DALLE2 prompt writing book http://dallery.gallery/wp-content/uploads/2022/07/The-DALL%C2%B7E-2-prompt-book-v1.02.pdf
- https://medium.com/nerd-for-tech/prompt-engineering-the-career-of-future-2fb93f90f117
- https://wiki.installgentoo.com/wiki/Stable_Diffusion overview
- https://www.reddit.com/r/StableDiffusion/comments/x41n87/how_to_get_images_that_dont_suck_a/
- https://mpost.io/best-100-stable-diffusion-prompts-the-most-beautiful-ai-text-to-image-prompts/
- https://andymatuschak.org/prompts/
- Intermediate
- go through all the GPT3 examples https://beta.openai.com/examples
- play with the smaller GPT3 models https://beta.openai.com/docs/models/gpt-3
- technique: self-asking, two step prompts https://twitter.com/OfirPress/status/1577302998136819713
- chain of thought prompting https://twitter.com/OfirPress/status/1577303423602790401
- and deploy GPT2 https://huggingface.co/gpt2
- Prompt structure with extensive examples
- review feedback extraction https://www.youtube.com/watch?v=3EjtHs_lXnk&t=1009s
- Ask Me Anything prompting
- using gpt3 for text2image prompts https://twitter.com/fabianstelzer/status/1554229347506176001
- DALLE2 asset generation + inpainting https://twitter.com/aifunhouse/status/1576202480936886273?s=20&t=5EXa1uYDPVa2SjZM-SxhCQ
- write a blogpost with GPT3 https://www.youtube.com/watch?v=NC7990PmDfM
- suhail journey https://twitter.com/Suhail/status/1541276314485018625?s=20&t=X2MVKQKhDR28iz3VZEEO8w
- quest for photorealism https://www.reddit.com/r/StableDiffusion/comments/x9zmjd/quest_for_ultimate_photorealism_part_2_colors/
- settings tweaking https://www.reddit.com/r/StableDiffusion/comments/x3k79h/the_feeling_of_discovery_sd_is_like_a_great_proc/
- seed selection https://www.reddit.com/r/StableDiffusion/comments/x8szj9/tutorial_seed_selection_and_the_impact_on_your/
- minor parameter parameter difference study (steps, clamp_max, ETA, cutn_batches, etc) https://twitter.com/KyrickYoung/status/1500196286930292742
- Advanced
- integrating Google Search with GPT3: https://twitter.com/OfirPress/status/1577302733383925762
- teach AI how to fish - You are X, you can do Y: https://github.com/nat/natbot/blob/main/natbot.py
- play with gpt-neoX and gpt-j https://neox.labml.ai/playground
- defense against prompt injection https://twitter.com/goodside/status/1578278974526222336
- https://creator.nightcafe.studio/vqgan-clip-keyword-modifier-comparison VQGAN+CLIP Keyword Modifier Comparison We compared 126 keyword modifiers with the same prompt and initial image. These are the results.
- Google released PartiPrompts as a benchmark: https://parti.research.google/ "PartiPrompts (P2) is a rich set of over 1600 prompts in English that we release as part of this work. P2 can be used to measure model capabilities across various categories and challenge aspects."
- Video tutorials
- Misc
- Prompt Generator: https://huggingface.co/succinctly/text2image-prompt-generator
- This is a GPT-2 model fine-tuned on the succinctly/midjourney-prompts dataset, which contains 250k text prompts that users issued to the Midjourney text-to-image service over a month period. This prompt generator can be used to auto-complete prompts for any text-to-image model (including the DALL·E family)
- Prompt Parrot https://colab.research.google.com/drive/1GtyVgVCwnDfRvfsHbeU0AlG-SgQn1p8e?usp=sharing
- This notebook is designed to train language model on a list of your prompts,generate prompts in your style, and synthesize wonderful surreal images! ✨
- https://twitter.com/stuhlmueller/status/1575187860063285248
- The Interactive Composition Explorer (ICE), a Python library for writing and debugging compositional language model programs https://github.com/oughtinc/ice
- The Factored Cognition Primer, a tutorial that shows using examples how to write such programs https://primer.ought.org
- Prompt Explorer
- Prompt generator https://www.aiprompt.io/
- GUIs
- 🌟 https://github.com/AUTOMATIC1111/stable-diffusion-webui
- notable fork https://github.com/sd-webui/stable-diffusion-webui
- https://nmkd.itch.io/t2i-gui
- windows https://github.com/razzorblade/stable-diffusion-gui
- mac https://github.com/divamgupta/diffusionbee-stable-diffusion-ui
- https://github.com/Fictiverse/StableDiffusion-Windows-GUI
- https://www.reddit.com/r/StableDiffusion/comments/xawppz/stable_diffusion_windows_gui_08_release/
- https://www.reddit.com/r/StableDiffusion/comments/xc65as/face_swapping/ -> interface which uses diffusers with https://github.com/leszekhanusz/diffusion-ui
- Deforum Diffusion https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
- Disco Diffusion https://news.ycombinator.com/item?id=32660138 "A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations."
- Waifu Diffusion "A model trained on danbooru (anime/manga drawing site with also lewds and nsfw on it) over 56k images.Produces FAR BETTER results if you're interested in getting manga and anime stuff out of stable diffusion."
- Edsynth and DAIN for coherence
- FILM: Frame Interpolation for Large Motion (github)
- Depth Mapping
- Art program plugins
- Krita: https://github.com/nousr/koi
- GIMP https://80.lv/articles/a-new-stable-diffusion-plug-in-for-gimp-krita/
- Photoshop: https://old.reddit.com/r/StableDiffusion/comments/wyduk1/show_rstablediffusion_integrating_sd_in_photoshop/
- Figma: https://twitter.com/RemitNotPaucity/status/1562319004563173376?s=20&t=fPSI5JhLzkuZLFB7fntzoA
- collage tool https://twitter.com/genekogan/status/1555184488606564353
- Papers
- 2015: Deep Unsupervised Learning using Nonequilibrium Thermodynamics founding paper of diffusion models
- Textual Inversion: https://arxiv.org/abs/2208.01618
- 2017: Attention is all you need
- https://dreambooth.github.io/
- very good BLOOM model overview
Live updated list: https://www.reddit.com/r/StableDiffusion/comments/wqaizj/list_of_stable_diffusion_systems/
- StableDiffusion Discord https://discord.com/invite/stablediffusion
- Deforum Discord https://discord.gg/upmXXsrwZc
- Lexica Discord https://discord.com/invite/bMHBjJ9wRh
- Midjourney
- https://promptbase.com/ Selling prompts that produce desirable results
- Prompt Galleries
- https://arthub.ai/
- 🌟 https://lexica.art/
- https://pagebrain.ai/promptsearch/
- https://avyn.com/
- https://dallery.gallery/
- The Ai Art: gallery for modifiers.
- urania.ai: Top 500 Artists gallery, sorted by image count. With modifiers/styles.
- Generrated: DALL•E 2 table gallery sorted by visual arts media.
- Artist Studies by @remi_durant: gallery and Search.
- CLIP Ranked Artists: gallery sorted by weight/strength.
- https://publicprompts.art/ very basic/limited but some good prompts
stable diffusion specific notes
Main: https://github.com/CompVis/stable-diffusion
Required reading:
- param intuitionhttps://www.reddit.com/r/StableDiffusion/comments/x41n87/how_to_get_images_that_dont_suck_a/
- CLI commands https://www.assemblyai.com/blog/how-to-run-stable-diffusion-locally-to-generate-images/#script-options
- Bundled Distros
- Web Distros
- https://www.mage.space/
- https://dreamlike.art/ has img2img
- https://inpainter.vercel.app/paint for inpainting
- https://promptart.labml.ai/feed
- https://www.strmr.com/ dreambooth tuning for $3
- Twitter Bots
- Windows "retard guides"
Prompt galleries and search engines:
- Lexica: Content-based search powered by OpenAI's CLIP model. Seed, CFG, Dimensions.
- OpenArt: Content-based search powered by OpenAI's CLIP model. Favorites.
- PromptHero: Random wall. Seed, CFG, Dimensions, Steps. Favorites.
- Libraire: Seed, CFG, Dimensions, Steps.
- Krea: modifiers focused UI. Favorites. Gives prompt suggestions and allows to create prompts over Stable diffusion, Waifu Diffusion and Disco diffusion. Really quick and useful
- Avyn: Search engine and generator.
- Pinegraph: discover, create and edit with Stable/Disco/Waifu diffusion models.
- Phraser: text and image search.
Visual search:
- Lexica: enter an image URL in the search bar. Or next to q=. Example
- Phraser: image icon at the right.
- same.energy
- Yandex, Bing, Google, Tineye, iqdb: reverse and similar image search engines.
- dessant/search-by-image: Open-source browser extension for reverse image search.
Prompt generators:
- promptoMANIA: Visual modifiers. Great selection. With weight setting.
- Phase.art: Visual modifiers. SD Generator and share.
- Phraser: Visual modifiers.
- AI Text Prompt Generator
- Dynamic Prompt generator
- succinctly/text2image: GPT-2 Midjourney trained text completion.
- Prompt Parrot colab: Train and generate prompts.
- cmdr2: 1-click SD installation with image modifiers selection.
Img2prompt:
- img2prompt Replicate by methexis-inc: Optimized for SD (clip ViT-L/14).
- CLIP Interrogator by @pharmapsychotic: select ViTL14 CLIP model.
- CLIP Artist Evaluator colab
- BLIP
Explore Artists, styles, and modifiers:
- Artist Style Studies & Modifier Studies by parrot zone: Gallery, Style, Spreadsheet
- Clip retrieval: search laion-5b dataset.
- Datasette: image search; image-count sort by artist, celebrities, characters, domain
- Visual arts: media list, related; Artists list by genre, medium; Portal
Guides and studies:
- Disco Diffusion Illustrated Settings
- Understanding MidJourney (and SD) through teapots.
- A Traveler’s Guide to the Latent Space
Prompt Tools directories and guides:
Other SD directories:
- Tools and Resources for AI Art by pharmapsychotic
- List of Stable Diffusion systems
- Active GitHub SD Forks: hlky sd-webui, AUTOMATIC1111, neonsecret, basujindal, lstein, Doggettx, deforum video
https://www.reddit.com/r/StableDiffusion/comments/wqaizj/list_of_stable_diffusion_systems/
Forks
- https://github.com/lkwq007/stablediffusion-infinity Outpainting with Stable Diffusion on an infinite canvas.
- https://github.com/basujindal/stable-diffusion This repo is a modified version of the Stable Diffusion repo, optimized to use less VRAM than the original by sacrificing inference speed.
- https://github.com/hlky/stable-diffusion (here is another fork that might be better)
- adds a bunch of features - GUI/webui, textual inversion, upscalers, mask and crop, img2img editor, word seeds, prompt weighting
- doesn't work on Mac Sygil-Dev/stable-diffusion#173
- How to Fine-tune Stable Diffusion using Textual Inversion https://towardsdatascience.com/how-to-fine-tune-stable-diffusion-using-textual-inversion-b995d7ecc095
- https://github.com/AbdBarho/stable-diffusion-webui-docker
- Run Stable Diffusion on your machine with a nice UI without any hassle! This repository provides the WebUI as a docker image for easy setup and deployment. Please note that the WebUI is experimental and evolving quickly, so expect some bugs.
- doesnt work on m1 mac yet AbdBarho/stable-diffusion-webui-docker#31
- adds a bunch of features - GUI/webui, textual inversion, upscalers, mask and crop, img2img editor, word seeds, prompt weighting
- https://github.com/invoke-ai/InvokeAI (previously https://github.com/lstein/stable-diffusion) and https://github.com/magnusviri/stable-diffusion
- An interactive command-line interface that accepts the same prompt and switches as the Discord bot.
- A basic Web interface that allows you to run a local web server for generating images in your browser.
- A notebook for running the code on Google Colab.
- Support for img2img in which you provide a seed image to guide the image creation. (inpainting & masking coming soon)
- Upscaling and face fixing using the optional ESRGAN (standalone: https://news.ycombinator.com/item?id=32628761) and GFPGAN packages.
- Weighted subprompts for prompt tuning.
- Textual inversion for customization of the prompt language and images.
- fuller feature list https://www.reddit.com/r/StableDiffusion/comments/xcclmf/comment/io6u03s/?utm_source=reddit&utm_medium=web2x&context=3
- https://github.com/bfirsh/stable-diffusion
- works on M1 Macs - blog, tweet
- can also look at
environment-mac.yaml
from https://github.com/fragmede/stable-diffusion/blob/mps_consistent_seed/environment-mac.yaml
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon
- another m1 mac compatible fork - only 2 samplers, Euler and DPM2, with real-ESRGAN upscaling
- https://colab.research.google.com/drive/1kw3egmSn-KgWsikYvOMjJkVDsPLjEMzl
- https://github.com/harubaru/waifu-diffusion
- nicer GUI for img2img
- fast-stable-diffusion colabs, +25% speed increase + memory efficient. https://github.com/TheLastBen/fast-stable-diffusion (be careful on gdrive security)
SD Tooling
- SD's DreamStudio https://beta.dreamstudio.ai/dream
- Stable Worlds: colab for 3d stitched worlds via StableDiffusion https://twitter.com/NaxAlpha/status/1578685845099290624
- Midjourney + SD: https://twitter.com/EMostaque/status/1561917541743841280
- Nightcafe Studio
- misc
- (super super raw dont try yet) https://github.com/breadthe/sd-buddy
Other languages
- Chinese: https://twitter.com/_akhaliq/status/1572580845785083906
- Japanese: https://twitter.com/_akhaliq/status/1571977273489739781
- How SD works
- https://huggingface.co/blog/stable_diffusion
- https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1?usp=sharing
- https://twitter.com/johnowhitaker/status/1565710033463156739
- https://twitter.com/ai__pub/status/1561362542487695360
- https://twitter.com/JayAlammar/status/1572297768693006337
- https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1?usp=sharing
- inside https://keras.io/guides/keras_cv/generate_images_with_stable_diffusion/#wait-how-does-this-even-work
- Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator
- A black and white photo of a young woman, studio lighting, realistic, Ilford HP5 400
- https://news.ycombinator.com/item?id=32642255#32646761
- For something like this, you ideally would want a powerful GPU with 12-24gb VRAM.
- A $500 RTX 3070 with 8GB of VRAM can generate 512x512 images with 50 steps in 7 seconds.
DallE banned so SD https://twitter.com/almost_digital/status/1556216820788609025?s=20&t=GCU5prherJvKebRrv9urdw
- Whisper
- https://huggingface.co/spaces/sensahin/YouWhisper YouWhisper converts Youtube videos to text using openai/whisper.
- https://twitter.com/jeffistyping/status/1573145140205846528 youtube whipserer
- multilingual subtitles https://twitter.com/1littlecoder/status/1573030143848722433
- video subtitles https://twitter.com/m1guelpf/status/1574929980207034375
- you can join whisper to stable diffusion for reasons https://twitter.com/fffiloni/status/1573733520765247488/photo/1
- known problems https://twitter.com/lunixbochs/status/1574848899897884672 (edge case with catastrophic failures)
- textually guided audio https://twitter.com/FelixKreuk/status/1575846953333579776
- Codegen
- pdf to structured data https://www.impira.com/blog/hey-machine-whats-my-invoice-total
- text to Human Motion diffusion https://twitter.com/GuyTvt/status/1577947409551851520
- abs: https://arxiv.org/abs/2209.14916
- project page: https://guytevet.github.io/mdm-page/