notes on AI state of the art, with a focus on generative and large language models. These are the "raw materials" for the https://lspace.swyx.io/ newsletter.
This repo used to be called https://github.com/sw-yx/prompt-eng, but was renamed because Prompt Engineering is Overhyped.
This Readme is just the high level overview of the space; you should see the most updates in the OTHER markdown files in this repo:
IMAGE_GEN.md
- the most developed file, with the heaviest emphasis notes on Stable Diffusion, and some on midjourney and dalle.IMAGE_PROMPTS.md
- a small swipe file of good image prompts
TEXT.md
- text generation, mostly with GPT-4TEXT_CHAT.md
- information on ChatGPT and competitors, as well as derivative productsTEXT_SEARCH.md
- information on GPT-4 enabled semantic search and other infoTEXT_PROMPTS.md
- a small swipe file of good GPT3 prompts
INFRA.md
- raw notes on AI Infrastructure, Hardware and ScalingAUDIO.md
- tracking audio/music/voice transcription + generation- stub notes - very small/lightweight proto pages
AGENTS.md
- tracking "agentic AI"CODE.md
- codegen models, like Copilot- etc...
Table of Contents
- images
- video
- img2img of famous movie scenes (lalaland)
- img2img transforming actor with ebsynth + koe_recast
- how ebsynth works https://twitter.com/TomLikesRobots/status/1612047103806545923?s=20
- virtual fashion (karenxcheng)
- seamless tiling images
- evolution of scenes (xander)
- outpainting https://twitter.com/orbamsterdam/status/1568200010747068417?s=21&t=rliacnWOIjJMiS37s8qCCw
- webUI img2img collaboration https://twitter.com/_akhaliq/status/1563582621757898752
- image to video with rotation https://twitter.com/TomLikesRobots/status/1571096804539912192
- "prompt paint" https://twitter.com/1littlecoder/status/1572573152974372864
- audio2video animation of your face https://twitter.com/siavashg/status/1597588865665363969
- physical toys to 3d model + animation https://twitter.com/sergeyglkn/status/1587430510988611584
- music videos
- video killed the radio star, colab This uses OpenAI's Whisper speech-to-text, allowing you to take a YouTube video & create a Stable Diffusion animation prompted by the lyrics in the YouTube video
- Stable Diffusion Videos generates videos by interpolating between prompts and audio
- direct text2video project
- img2img of famous movie scenes (lalaland)
- text-to-3d https://twitter.com/_akhaliq/status/1575541930905243652
- text products
- has a list of usecases at the end https://huyenchip.com/2023/04/11/llm-engineering.html
- Jasper
- GPT for Obsidian https://reasonabledeviations.com/2023/02/05/gpt-for-second-brain/
- gpt3 email https://github.com/sw-yx/gpt3-email
- gpt3() in google sheet 2020, 2022 - sheet google sheets https://twitter.com/mehran__jalali/status/1608159307513618433
- https://www.summari.com/ Summari helps busy people read more
- market maps/landscapes
- sequoia market map https://twitter.com/sonyatweetybird/status/1584580362339962880
- base10 market map https://twitter.com/letsenhance_io/status/1594826383305449491
- matt shumer market map https://twitter.com/mattshumer_/status/1620465468229451776 https://docs.google.com/document/d/1sewTBzRF087F6hFXiyeOIsGC1N4N3O7rYzijVexCgoQ/edit
- nfx https://www.nfx.com/post/generative-ai-tech-5-layers?ref=context-by-cohere
- a16z https://a16z.com/2023/01/19/who-owns-the-generative-ai-platform/?ref=context-by-cohere
- madrona https://www.madrona.com/foundation-models/
- game assets -
- emad thread https://twitter.com/EMostaque/status/1591436813750906882
- scenario.gg https://twitter.com/emmanuel_2m/status/1593356241283125251
- 3d game character modeling example
- MarioGPT https://arxiv.org/pdf/2302.05981.pdf https://www.slashgear.com/1199870/mariogpt-uses-ai-to-generate-endless-super-mario-levels-for-free/ https://github.com/shyamsn97/mario-gpt/blob/main/mario_gpt/level.py
The more advanced GPT3 reads have been split out to https://github.com/sw-yx/ai-notes/blob/main/TEXT.md
- [Bill Gates on AI](https://www.gatesnotes.com/The-Age-of-AI-Has-Begun
- Steve Yegge on AI for developers
- excellent introduction to foundation models from MSR https://youtu.be/HQI6O5DlyFc
- openAI prompt tutorial https://beta.openai.com/docs/quickstart/add-some-examples
- google LAMDA intro https://aitestkitchen.withgoogle.com/how-lamda-works
- DALLE2 prompt writing book http://dallery.gallery/wp-content/uploads/2022/07/The-DALL%C2%B7E-2-prompt-book-v1.02.pdf
- https://medium.com/nerd-for-tech/prompt-engineering-the-career-of-future-2fb93f90f117
- https://ourworldindata.org/brief-history-of-ai ai progress overview with nice charts
- Jon Stokes' AI Content Generation, Part 1: Machine Learning Basics
- What are transformer models and how do they work? - maybe a bit too high level
- text generation
- humanloop's prompt engineering 101
- Stephen Wolfram's explanations https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
- https://andymatuschak.org/prompts/
- https://www.jonstokes.com/p/chatgpt-explained-a-guide-for-normies for normies
- image generation
- https://wiki.installgentoo.com/wiki/Stable_Diffusion overview
- https://www.reddit.com/r/StableDiffusion/comments/x41n87/how_to_get_images_that_dont_suck_a/
- https://mpost.io/best-100-stable-diffusion-prompts-the-most-beautiful-ai-text-to-image-prompts/
- https://www.kdnuggets.com/2021/03/beginners-guide-clip-model.html
- for nontechnical
- State of AI Report: 2018, 2019, 2020, 2021, 2022
- reverse chronological major events https://bleedingedge.ai/
- https://cims.nyu.edu/~sbowman/eightthings.pdf
- LLMs predictably get more capable with increasing investment, even without targeted innovation.
- Many important LLM behaviors emerge unpredictably as a byproduct of increasing investment.
- LLMs often appear to learn and use representations of the outside world.
- There are no reliable techniques for steering the behavior of LLMs.
- Experts are not yet able to interpret the inner workings of LLMs.
- Human performance on a task isn’t an upper bound on LLM performance.
- LLMs need not express the values of their creators nor the values encoded in web text.
- Brief interactions with LLMs are often misleading.
- openai prompt eng cookbook https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md
- on prompt eng overview https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/
- Recap of 2022's major AI developments https://www.deeplearning.ai/the-batch/issue-176/
- DALLE2 asset generation + inpainting https://twitter.com/aifunhouse/status/1576202480936886273?s=20&t=5EXa1uYDPVa2SjZM-SxhCQ
- suhail journey https://twitter.com/Suhail/status/1541276314485018625?s=20&t=X2MVKQKhDR28iz3VZEEO8w
- composable diffusion - "AND" instead of "and" https://twitter.com/TomLikesRobots/status/1580293860902985728
- on BPE tokenization https://towardsdatascience.com/byte-pair-encoding-subword-based-tokenization-algorithm-77828a70bee0 see also google sentencepiece and openai tiktoken
- source in GPT2 source https://github.com/openai/gpt-2/blob/master/src/encoder.py
- note that BPEs are suboptimal https://www.lesswrong.com/posts/dFbfCLZA4pejckeKc/a-mechanistic-explanation-for-solidgoldmagikarp-like-tokens?commentId=9jNdKscwEWBB4GTCQ
- causes math and string character issues https://news.ycombinator.com/item?id=35363769
- https://platform.openai.com/tokenizer and https://github.com/openai/tiktoken (more up to date: https://tiktokenizer.vercel.app/)
- creates its own language https://twitter.com/giannis_daras/status/1531693104821985280
- img2img https://andys.page/posts/how-to-draw/
- on language modeling https://lena-voita.github.io/nlp_course/language_modeling.html and approachable but technical explanation of language generation including sampling from distributions and some mechanistic intepretability (finding neuron that tracks quote state)
- quest for photorealism https://www.reddit.com/r/StableDiffusion/comments/x9zmjd/quest_for_ultimate_photorealism_part_2_colors/
- settings tweaking https://www.reddit.com/r/StableDiffusion/comments/x3k79h/the_feeling_of_discovery_sd_is_like_a_great_proc/
- seed selection https://www.reddit.com/r/StableDiffusion/comments/x8szj9/tutorial_seed_selection_and_the_impact_on_your/
- minor parameter parameter difference study (steps, clamp_max, ETA, cutn_batches, etc) https://twitter.com/KyrickYoung/status/1500196286930292742
- Generative AI: Autocomplete for everything https://noahpinion.substack.com/p/generative-ai-autocomplete-for-everything?sd=pf
- How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources good paper with the development history of the GPT family of models and how the capabilities developed
- Transformers from scratch https://e2eml.school/transformers.html
- transformers vs LSTM https://medium.com/analytics-vidhya/why-are-lstms-struggling-to-matchup-with-transformers-a1cc5b2557e3
- transformer code walkthru https://twitter.com/mark_riedl/status/1555188022534176768
- transformer familyi https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/
- carmack paper list https://news.ycombinator.com/item?id=34639634
- Transformer models: an introduction and catalog https://arxiv.org/abs/2302.07730
- Deepmind - formal algorithms for transformers https://arxiv.org/pdf/2207.09238.pdf
- Jay Alammar explainers
- karpathy on transformers
- Convergence: The ongoing consolidation in AI is incredible. When I started ~decade ago vision, speech, natural language, reinforcement learning, etc. were completely separate; You couldn't read papers across areas - the approaches were completely different, often not even ML based. In 2010s all of these areas started to transition 1) to machine learning and specifically 2) neural nets. The architectures were diverse but at least the papers started to read more similar, all of them utilizing large datasets and optimizing neural nets. But as of approx. last two years, even the neural net architectures across all areas are starting to look identical - a Transformer (definable in ~200 lines of PyTorch https://github.com/karpathy/minGPT/blob/master/mingpt/model.py…), with very minor differences. Either as a strong baseline or (often) state of the art. (tweetstorm)
- Why Transformers won: The Transformer is a magnificient neural network architecture because it is a general-purpose differentiable computer. It is simultaneously: 1) expressive (in the forward pass) 2) optimizable (via backpropagation+gradient descent) 3) efficient (high parallelism compute graph) tweetstorm
- BabyGPT with two tokens 0/1 and context length of 3, viewing it as a finite state markov chain. It was trained on the sequence "111101111011110" for 50 iterations. The parameters and the architecture of the Transformer modifies the probabilities on the arrows.
- Build GPT from scratch https://www.youtube.com/watch?v=kCc8FmEb1nY
- different GPT from scratch in 60 LOC https://jaykmody.com/blog/gpt-from-scratch/
- 137 emergent abilities of large language models
- Emergent few-shot prompted tasks: BIG-Bench and MMLU benchmarks
- Emergent prompting strategies
- Instruction-following
- Scratchpad
- Using open-book knowledge for fact checking
- Chain-of-thought prompting
- Differentiable search index
- Self-consistency
- Leveraging explanations in prompting
- Least-to-most prompting
- Zero-shot chain-of-thought
- Calibration via P(True)
- Multilingual chain-of-thought
- Ask-me-anything prompting
- Images
- Eugene Yan explanation of the Text to Image stack https://eugeneyan.com/writing/text-to-image/
- VQGAN/CLIP https://minimaxir.com/2021/08/vqgan-clip/
- 10 years of Image generation history https://zentralwerkstatt.org/blog/ten-years-of-image-synthesis
- Vision Transformers (ViT) Explained https://www.pinecone.io/learn/vision-transformers/
- negative prompting https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/
- best papers of 2022 https://www.yitay.net/blog/2022-best-nlp-papers
- Predictability and Surprise in Large Generative Models - good survey paper of what we know about scaling and capabilities and rise of LLMs so far
- more prompt eng papers https://github.com/dair-ai/Prompt-Engineering-Guide
- https://creator.nightcafe.studio/vqgan-clip-keyword-modifier-comparison VQGAN+CLIP Keyword Modifier Comparison
- History of Transformers
- https://kipp.ly/blog/transformer-taxonomy/ This document is my running literature review for people trying to catch up on AI. It covers 22 models, 11 architectural changes, 7 post-pre-training techniques and 3 training techniques (and 5 things that are none of the above)
- https://magazine.sebastianraschka.com/p/understanding-large-language-models
- https://finbarrtimbers.substack.com/p/five-years-of-progress-in-gpts GPT1-3, Megatron, Gopher, Chinchilla, PaLM, LLaMa
- good summary paper (8 things to know) https://cims.nyu.edu/~sbowman/eightthings.pdf
We compared 126 keyword modifiers with the same prompt and initial image. These are the results.
- https://creator.nightcafe.studio/collection/8dMYgKm1eVXG7z9pV23W
- Google released PartiPrompts as a benchmark: https://parti.research.google/ "PartiPrompts (P2) is a rich set of over 1600 prompts in English that we release as part of this work. P2 can be used to measure model capabilities across various categories and challenge aspects."
- Video tutorials
- History of papers
- 2008: Unified Architecture for NLP (Collobert-Weston) https://twitter.com/ylecun/status/1611921657802768384
- 2015: Semi-supervised sequence learning https://twitter.com/deliprao/status/1611896130589057025?s=20
- 2017: Transformers (Vaswani et al)
- 2018: GPT (Radford et al)
- Misc
- StabilityAI CIO perspective https://danieljeffries.substack.com/p/the-turning-point-for-truly-open?sd=pf
- https://github.com/awesome-stable-diffusion/awesome-stable-diffusion
- https://github.com/microsoft/LMOps guide to msft prompt research
- gwern's behind the scenes discussion of Bing, GPT4, and the Microsoft-OpenAI relationship https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned
- https://gist.github.com/rain-1/eebd5e5eb2784feecf450324e3341c8d
- https://github.com/underlines/awesome-marketing-datascience/blob/master/awesome-ai.md#llama-models
- StableDiffusion Discord https://discord.com/invite/stablediffusion
- LAION discord https://discord.gg/xBPBXfcFHd
- Eleuther discord: https://www.eleuther.ai/get-involved/ (primer)
- https://reddit.com/r/stableDiffusion
- Akhaliq Discord: https://discord.gg/nYqfg4gnBt
- Karpathy Discord: https://discord.gg/3zy8kqD9Cp
- HuggingFace Discord: https://discuss.huggingface.co/t/join-the-hugging-face-discord/11263
- Deforum Discord https://discord.gg/upmXXsrwZc
- Lexica Discord https://discord.com/invite/bMHBjJ9wRh
- Perplexity Discord https://discord.com/invite/kWJZsxPDuX
- Chatgpt Hackers https://www.chatgpthackers.dev/
- Agents
- AutoGPT discord
- BabyAGI discord
- Midjourney's discord
- how to use midjourney v4 https://twitter.com/fabianstelzer/status/1588856386540417024?s=20&t=PlgLuGAEEds9HwfegVRrpg
- https://stablehorde.net/
This list will be out of date but will get you started. My live list of people to follow is at: https://twitter.com/i/lists/1585430245762441216
- Researchers/Developers
- https://twitter.com/_jasonwei
- https://twitter.com/johnowhitaker/status/1565710033463156739
- https://twitter.com/altryne/status/1564671546341425157
- https://twitter.com/SchmidhuberAI
- https://twitter.com/nearcyan
- https://twitter.com/karinanguyen_
- https://twitter.com/abhi_venigalla
- https://twitter.com/advadnoun
- https://twitter.com/polynoamial
- https://twitter.com/vovahimself
- https://twitter.com/sarahookr
- https://twitter.com/shaneguML
- https://twitter.com/MaartenSap
- https://twitter.com/ethanCaballero
- https://twitter.com/ShayneRedford
- https://twitter.com/seb_ruder
- https://twitter.com/rasbt
- https://twitter.com/wightmanr
- https://twitter.com/GaryMarcus
- https://twitter.com/ylecun
- https://twitter.com/karpathy
- https://twitter.com/pirroh
- https://twitter.com/eerac
- News/Aggregators
- https://twitter.com/ai__pub
- https://twitter.com/WeirdStableAI
- https://twitter.com/multimodalart
- https://twitter.com/LastWeekinAI
- https://twitter.com/paperswithcode
- https://twitter.com/DeepLearningAI_
- https://twitter.com/dl_weekly
- https://twitter.com/slashML
- https://twitter.com/_akhaliq
- https://twitter.com/aaditya_ai
- https://twitter.com/bentossell
- https://twitter.com/johnvmcdonnell
- Founders/Builders/VCs
- https://twitter.com/levelsio
- https://twitter.com/goodside
- https://twitter.com/c_valenzuelab
- https://twitter.com/Raza_Habib496
- https://twitter.com/sharifshameem/status/1562455690714775552
- https://twitter.com/genekogan/status/1555184488606564353
- https://twitter.com/levelsio/status/1566069427501764613?s=20&t=camPsWtMHdSSEHqWd0K7Ig
- https://twitter.com/amanrsanger
- https://twitter.com/ctjlewis
- https://twitter.com/sarahcat21
- https://twitter.com/jackclarkSF
- https://twitter.com/alexandr_wang
- https://twitter.com/rameerez
- https://twitter.com/scottastevenson
- https://twitter.com/denisyarats
- Stability
- OpenAI
- HuggingFace
- Artists
- Other
- Bots and Apps
- Narrow, tedium domain usecases https://twitter.com/WillManidis/status/1584900092615528448 and https://twitter.com/WillManidis/status/1584900100480192516
- antihype https://twitter.com/alexandr_wang/status/1573302977418387457
- antihype https://twitter.com/fchollet/status/1612142423425138688?s=46&t=pLCNW9pF-co4bn08QQVaUg
- prompt eng memes
- things stablediffusion struggles with https://opguides.info/posts/aiartpanic/
- New Google
- New Powerpoint
- via emad
- Appending prompts by default in UI
- DALLE: https://twitter.com/levelsio/status/1588588688115912705?s=20&t=0ojpGmH9k6MiEDyVG2I6gg
- NSFW filter https://vickiboykis.com/2022/11/18/some-notes-on-the-stable-diffusion-safety-filter/
- On "AI Art Panic" https://opguides.info/posts/aiartpanic/
- Yannick influencing OPENRAIL-M https://www.youtube.com/watch?v=W5M-dvzpzSQ
- art schools accepting AI art https://twitter.com/DaveRogenmoser/status/1597746558145265664
- DRM issues https://undeleted.ronsor.com/voice.ai-gpl-violations-with-a-side-of-drm/
- stealing art https://stablediffusionlitigation.com
- http://www.stablediffusionfrivolous.com/
- stable attribution https://news.ycombinator.com/item?id=34670136
- coutner argument for disney https://twitter.com/jonst0kes/status/1616219435492163584?s=46&t=HqQqDH1yEwhWUSQxYTmF8w
- research on stable diffusion copying https://twitter.com/officialzhvng/status/1620535905298817024?s=20&t=NC-nW7pfDa8nyRD08Lx1Nw This paper used Stable Diffusion to generate 175 million images over 350,000 prompts and only found 109 near copies of training data. Am I right that my main takeaway from this is how good Stable Diffusion is at not memorizing training examples?
- Anthropic - https://arxiv.org/pdf/2112.00861.pdf
- Helpful: attempt to do what is ask. concise, efficient. ask followups. redirect bad questions.
- Honest: give accurate information, express uncertainty. don't imitate responses expected from an expert if it doesn't have the capabilities/knowledge
- Harmless: not offensive/discriminatory. refuse to assist dangerous acts. recognize when providing sensitive/consequential advice
- criticism and boundaries as future direction https://twitter.com/davidad/status/1628489924235206657?s=46&t=TPVwcoqO8qkc7MuaWiNcnw
- Just Eliezer entire body of work
- https://twitter.com/esyudkowsky/status/1625922986590212096
- agi list of lethalities https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
- note that eliezer has made controversial comments in the past and also in recent times (TIME article)
- Connor Leahy may be a more sane/measured/technically competent version of yud https://overcast.fm/+aYlOEqTJ0
- it's not just paperclip factories
- https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like
- the 6 month pause letter
- chinese regulation https://www.chinalawtranslate.com/en/overview-of-draft-measures-on-generative-ai/
- italy banning chatgpt
- Whisper
- https://huggingface.co/spaces/sensahin/YouWhisper YouWhisper converts Youtube videos to text using openai/whisper.
- https://twitter.com/jeffistyping/status/1573145140205846528 youtube whipserer
- multilingual subtitles https://twitter.com/1littlecoder/status/1573030143848722433
- video subtitles https://twitter.com/m1guelpf/status/1574929980207034375
- you can join whisper to stable diffusion for reasons https://twitter.com/fffiloni/status/1573733520765247488/photo/1
- known problems https://twitter.com/lunixbochs/status/1574848899897884672 (edge case with catastrophic failures)
- textually guided audio https://twitter.com/FelixKreuk/status/1575846953333579776
- Codegen
- pdf to structured data - Impira used t to do it (dead link: https://www.impira.com/blog/hey-machine-whats-my-invoice-total) but if you look hard enough on twitter there are some alternatives
- text to Human Motion diffusion https://twitter.com/GuyTvt/status/1577947409551851520
- abs: https://arxiv.org/abs/2209.14916
- project page: https://guytevet.github.io/mdm-page/