SnowdenLee / Awesome-Controllable-T2I-Diffusion-Models

A collection of resources on controllable generation with text-to-image diffusion models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Maintenance PR's Welcome Survey Paper


Awesome Controllable T2I Diffusion Models


We are focusing on how to Control text-to-image diffusion models with Novel Conditions.

For more detailed information, please refer to our survey paper: Controllable Generation with Text-to-Image Diffusion Models: A Survey

img

Citation

@article{cao2024controllable,
  title={Controllable Generation with Text-to-Image Diffusion Models: A Survey},
  author={Pu Cao and Feng Zhou and Qing Song and Lu Yang},
  journal={arXiv preprint arXiv:2403.04279},
  year={2024}
}

🌈 Contents

🚀Generation with Specific Condition

Personalization

Subject-Driven Generation

An Image is Worth One Word- Personalizing Text-to-Image Generation using Textual Inversion.
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or.
ICLR 2023. [PDF]

DreamBooth- Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation.
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman.
CVPR 2023. [PDF]

Re-Imagen: Retrieval-Augmented Text-to-Image Generator.
Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen.
ICLR 2023. [PDF]

DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning.
Ziyi Dong, Pengxu Wei, Liang Lin.
arXiv 2022. [PDF]

Multi-Concept Customization of Text-to-Image Diffusion.
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu.
CVPR 2023. [PDF]

Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics.
Anton Voronov, Mikhail Khoroshikh, Artem Babenko, Max Ryabinin.
NeurIPS 2023. [PDF]

Designing an Encoder for Fast Personalization of Text-to-Image Models.
Rinon Gal, Moab Arar, Yuval Atzmon, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or.
arXiv 2023. [PDF]

ELITE- Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation.
Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, Wangmeng Zuo.
ICCV 2023. [PDF]

Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation.
Yiyang Ma, Huan Yang, Wenjing Wang, Jianlong Fu, Jiaying Liu.
arXiv 2023. [PDF]

P+- Extended Textual Conditioning in Text-to-Image Generation.
Andrey Voynov, Qinghao Chu, Daniel Cohen-Or, Kfir Aberman.
arXiv 2023. [PDF]

SVDiff- Compact Parameter Space for Diffusion Fine-Tuning.
Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, Feng Yang.
ICCV 2023. [PDF]

A Closer Look at Parameter-Efficient Tuning in Diffusion Models .
Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu.
arXiv 2023. [PDF]

Subject-driven Text-to-Image Generation via Apprenticeship Learning.
Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William W. Cohen.
NeurIPS 2023. [PDF]

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models.
Xuhui Jia, Yang Zhao, Kelvin C. K. Chan, Yandong Li, Han Zhang, Boqing Gong, Tingbo Hou, Huisheng Wang, Yu-Chuan Su.
arXiv 2023. [PDF]

InstantBooth- Personalized Text-to-Image Generation without Test-Time Finetuning.
Jing Shi, Wei Xiong, Zhe Lin, Hyun Joon Jung.
arXiv 2023. [PDF]

Controllable Textual Inversion for Personalized Text-to-Image Generation.
Jianan Yang, Haobo Wang, Yanming Zhang, Ruixuan Xiao, Sai Wu, Gang Chen, Junbo Zhao.
arXiv 2023. [PDF]

Gradient-Free Textual Inversion.
Zhengcong Fei, Mingyuan Fan, Junshi Huang.
ACM MM 2023. [PDF]

Key-Locked Rank One Editing for Text-to-Image Personalization.
Yoad Tewel, Rinon Gal, Gal Chechik, Yuval Atzmon.
SIGGRAPH 2023. [PDF]

DisenBooth- Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation.
Hong Chen, Yipeng Zhang, Simin Wu, Xin Wang, Xuguang Duan, Yuwei Zhou, Wenwu Zhu.
arXiv 2023. [PDF]

BLIP-Diffusion- Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing.
Dongxu Li, Junnan Li, Steven C. H. Hoi.
NeurIPS 2023. [PDF]

ProSpect- Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models.
Yuxin Zhang, Weiming Dong, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma, Tong-Yee Lee, Oliver Deussen, Changsheng Xu.
TOG 2023. [PDF]

Break-A-Scene: Extracting Multiple Concepts from a Single Image.
Omri Avrahami, Kfir Aberman, Ohad Fried, Daniel Cohen-Or, Dani Lischinski.
SIGGRAPH ASIA 2023. [PDF]

COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models.
Jinqi Xiao, Miao Yin, Yu Gong, Xiao Zang, Jian Ren, Bo Yuan.
arXiv 2023. [PDF]

Controlling Text-to-Image Diffusion by Orthogonal Finetuning .
Zeju Qiu, Weiyang Liu, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, Bernhard Schölkopf.
NeurIPS 2023. [PDF]

Generate Anything Anywhere in Any Scene.
Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee.
arXiv 2023. [PDF]

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models .
Moab Arar, Rinon Gal, Yuval Atzmon, Gal Chechik, Daniel Cohen-Or, Ariel Shamir, Amit H. Bermano.
SIGGRAPH ASIA 2023. [PDF]

Subject-Diffusion- Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning.
Jian Ma, Junhao Liang, Chen Chen, Haonan Lu.
ICLR2024 (3566). [PDF]

Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation.
Shin-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao, Bernard B W Yang, Giyeong Oh, Yanmin Gong.
ICLR 2024. [PDF]

Kosmos-G: Generating Images in Context with Multimodal Large Language Models.
Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei.
arXiv 2023. [PDF]

AN IMAGE IS WORTH MULTIPLE WORDS : LEARNING OBJECT LEVEL CONCEPTS USING MULTI-CONCEPT PROMPT LEARNING.
Chen Jin, Ryutaro Tanno, Amrutha Saseendran, Tom Diethe, Philip Teare.
arXiv 2023. [PDF]

A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization.
Xingzhe He, Zhiwen Cao, Nicholas Kolkin, Lantao Yu, Helge Rhodin, Ratheesh Kalarot.
arXiv 2023. [PDF]

DIFFNAT- Improving Diffusion Image Quality Using Natural Image Statistics.
Aniket Roy, Maiterya Suin, Anshul Shah, Ketul Shah, Jiang Liu, Rama Chellappa.
arXiv 2023. [PDF]

An Image is Worth Multiple Words- Multi-attribute Inversion for Constrained Text-to-Image Synthesis.
Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, Balaji Vasan Srinivasan.
arXiv 2023. [PDF]

Lego- Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models.
Saman Motamed, Danda Pani Paudel, Luc Van Gool.
arXiv 2023. [PDF]

CatVersion- Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization.
Ruoyu Zhao, Mingrui Zhu, Shiyin Dong, Nannan Wang, Xinbo Gao.
arXiv 2023. [PDF]

CLiC: Concept Learning in Context.
Mehdi Safaee, Aryan Mikaeili, Or Patashnik, Daniel Cohen-Or, Ali Mahdavi-Amiri.
arXiv 2023. [PDF]

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model.
Haoyu Zhao, Tianyi Lu, Jiaxi Gu, Xing Zhang, Zuxuan Wu, Hang Xu, Yu-Gang Jiang.
arXiv 2023. [PDF]

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models.
Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi, Tingbo Hou.
arXiv 2023. [PDF]

VideoBooth: Diffusion-based Video Generation with Image Prompts.
Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu.
arXiv 2023. [PDF]

Customization Assistant for Text-to-image Generation.
Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Tong Sun.
arXiv 2023. [PDF]

Decoupled Textual Embeddings for Customized Image Generation.
Yufei Cai, Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hu Han, Wangmeng Zuo.
arXiv 2023. [PDF]

DreamTuner: Single Image is Enough for Subject-Driven Generation.
Miao Hua, Jiawei Liu, Fei Ding, Wei Liu, Jie Wu, Qian He.
arXiv 2023. [PDF]

Person-Driven Generation

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention.
Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han.
arXiv 2023. [PDF]

Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation.
Nico Giambi, Giuseppe Lisanti.
arXiv 2023. [PDF]

Face0- Instantaneously Conditioning a Text-to-Image Model on a Face.
Dani Valevski, Danny Wasserman, Yossi Matias, Yaniv Leviathan.
arXiv 2023. [PDF]

DreamIdentity- Improved Editability for Efficient Face-identity Preserved Image Generation .
Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Yongdong Zhang, Zhendong Mao.
arXiv 2023. [PDF]

HyperDreamBooth- HyperNetworks for Fast Personalization of Text-to-Image Models .
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman.
arXiv 2023. [PDF]

PhotoVerse- Tuning-Free Image Customization with Text-to-Image Diffusion Models.
Li Chen, Mengyi Zhao, Yiheng Liu, Mingxu Ding, Yangyang Song, Shizun Wang, Xu Wang, Hao Yang, Jing Liu, Kang Du, Min Zheng.
arXiv 2023. [PDF]

MagiCapture- High-Resolution Multi-Concept Portrait Customization.
Junha Hyung, Jaeyo Shin, Jaegul Choo.
arXiv 2023. [PDF]

High-fidelity Person-centric Subject-to-Image Synthesis.
Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin.
arXiv 2023. [PDF]

When StyleGAN Meets Stable Diffusion- a W + Adapter for Personalized Image Generation.
Xiaoming Li, Xinyu Hou, Chen Change Loy.
arXiv 2023. [PDF]

Retrieving Conditions from Reference Images for Diffusion Models.
Haoran Tang, Xin Zhou, Jieren Deng, Zhihong Pan, Hao Tian, Pratik Chaudhari.
arXiv 2023. [PDF]

FaceStudio: Put Your Face Everywhere in Seconds.
Yuxuan Yan, Chi Zhang, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng, Gang Yu, Bin Fu.
arXiv 2023. [PDF]

ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet.
Soon Yau Cheong, Armin Mustafa, Andrew Gilbert.
arXiv 2023. [PDF]

DemoCaricature: Democratising Caricature Generation with a Rough Sketch.
Dar-Yen Chen, Subhadeep Koley, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song.
arXiv 2023. [PDF]

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding.
Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan.
arXiv 2023. [PDF]

Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods.
Panos Achlioptas, Alexandros Benetatos, Iordanis Fostiropoulos, Dimitris Skourtis.
arXiv 2023. [PDF]

PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization.
Xu Peng, Junwei Zhu, Boyuan Jiang, Ying Tai, Donghao Luo, Jiangning Zhang, Wei Lin, Taisong Jin, Chengjie Wang, Rongrong Ji.
arXiv 2023. [PDF]

Style-Driven Generation

StyleDrop: Text-to-Image Generation in Any Style.
Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan.
arXiv 2023. [PDF]

StyleCrafter- Enhancing Stylized Text-to-Video Generation with Style Adapter.
Gongye Liu, Menghan Xia, Yong Zhang, Haoxin Chen, Jinbo Xing, Xintao Wang, Yujiu Yang, Ying Shan.
arXiv 2023. [PDF]

ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation.
Dar-Yen Chen, Hamish Tennent, Ching-Wen Hsu.
arXiv 2023. [PDF]

Style Aligned Image Generation via Shared Attention.
Amir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or.
arXiv 2023. [PDF]

Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method.
Jiachun Pan, Hanshu Yan, Jun Hao Liew, Jiashi Feng, Vincent Y. F. Tan.
arXiv 2023. [PDF]

Interaction-Driven Generation

ReVersion- Diffusion-Based Relation Inversion from Images .
Ziqi Huang, Tianxing Wu, Yuming Jiang, Kelvin C. K. Chan, Ziwei Liu.
arXiv 2023. [PDF]

AnimateDiff- Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.
Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai.
ICLR2024 (6688). [PDF]

MotionDirector- Motion Customization of Text-to-Video Diffusion Models.
Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou.
arXiv 2023. [PDF]

LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation.
Ruiqi Wu, Liangyu Chen, Tong Yang, Chunle Guo, Chongyi Li, Xiangyu Zhang.
arXiv 2023. [PDF]

SAVE: Protagonist Diversification with Structure Agnostic Video Editing.
Yeji Song, Wonsik Shin, Junsoo Lee, Jeesoo Kim, Nojun Kwak.
arXiv 2023. [PDF]

Customizing Motion in Text-to-Video Diffusion Models.
Joanna Materzynska, Josef Sivic, Eli Shechtman, Antonio Torralba, Richard Zhang, Bryan Russell.
arXiv 2023. [PDF]

DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models.
Mengyang Feng, Jinlin Liu, Kai Yu, Yuan Yao, Zheng Hui, Xiefan Guo, Xianhui Lin, Haolan Xue, Chen Shi, Xiaowen Li, Aojie Li, Xiaoyang Kang, Biwen Lei, Miaomiao Cui, Peiran Ren, Xuansong Xie.
arXiv 2023. [PDF]

MotionCrafter: One-Shot Motion Customization of Diffusion Models.
Yuxin Zhang, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma, Weiming Dong, Changsheng Xu.
arXiv 2023. [PDF]

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models.
Jiun Tian Hoe, Xudong Jiang, Chee Seng Chan, Yap-Peng Tan, Weipeng Hu.
arXiv 2023. [PDF]

Image-Driven Generation

Hierarchical Text-Conditional Image Generation with CLIP Latents.
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen.
arXiv 2022. [PDF]

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model.
Xingqian Xu, Zhangyang Wang, Eric Zhang, Kai Wang, Humphrey Shi.
ICCV 2023. [PDF]

Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models.
Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, Humphrey Shi.
arXiv 2023. [PDF]

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models.
Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong.
NeurIPS 2023. [PDF]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models.
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, Wei Yang.
arXiv 2023. [PDF]

ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet.
Soon Yau Cheong, Armin Mustafa, Andrew Gilbert.
arXiv 2023. [PDF]

Context Diffusion: In-Context Aware Image Generation.
Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic.
arXiv 2023. [PDF]

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition.
Sicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, Bolei Zhou.
arXiv 2023. [PDF]

Distribution-Driven Generation

Concept-centric Personalization with Large-scale Diffusion Priors.
Pu Cao, Lu Yang, Feng Zhou, Tianrui Huang, Qing Song.
arXiv 2023. [PDF]

DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models.
Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge.
arXiv 2023. [PDF]

Spatial Control

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers.
Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu.
arXiv 2022. [PDF]

Sketch-Guided Text-to-Image Diffusion Models.
Andrey Voynov, Kfir Aberman, Daniel Cohen-Or.
SIGGRAPH 2023. [PDF]

SpaText: Spatio-Textual Representation for Controllable Image Generation.
Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin.
CVPR 2023. [PDF]

GLIGEN: Open-Set Grounded Text-to-Image Generation.
Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee.
CVPR 2023. [PDF]

Universal Guidance for Diffusion Models.
Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta, Micah Goldblum, Jonas Geiping, Tom Goldstein.
CVPRW 2023. [PDF]

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation.
Jiaxin Cheng, Xiao Liang, Xingjian Shi, Tong He, Tianjun Xiao, Mu Li.
arXiv 2023. [PDF]

Modulating Pretrained Diffusion Models for Multimodal Image Synthesis.
Cusuh Ham, James Hays, Jingwan Lu, Krishna Kumar Singh, Zhifei Zhang, Tobias Hinz.
SIGGRAPH 2023. [PDF]

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model.
Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang.
arXiv 2023. [PDF]

Freestyle Layout-to-Image Synthesis.
Han Xue, Zhiwu Huang, Qianru Sun, Li Song, Wenjun Zhang.
CVPR 2023. [PDF]

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation.
Guangcong Zheng, Xianpan Zhou, Xuewei Li, Zhongang Qi, Ying Shan, Xi Li.
CVPR 2023. [PDF]

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation.
Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, Qiang Xu.
ICCV 2023. [PDF]

Late-Constraint Diffusion Guidance for Controllable Image Synthesis.
Chang Liu, Dong Liu.
arXiv 2023. [PDF]

Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation.
Nico Giambi, Giuseppe Lisanti.
arXiv 2023. [PDF]

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation.
Kai Chen, Enze Xie, Zhe Chen, Yibo Wang, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung.
ICLR 2024. [PDF]

Grounded Text-to-Image Synthesis with Attention Refocusing.
Quynh Phung, Songwei Ge, Jia-Bin Huang.
arXiv 2023. [PDF]

Zero-shot spatial layout conditioning for text-to-image diffusion models.
Guillaume Couairon, Marlène Careil, Matthieu Cord, Stéphane Lathuilière, Jakob Verbeek.
ICCV 2023. [PDF]

Localized Text-to-Image Generation For Free via Cross Attention Control.
Yutong He, Ruslan Salakhutdinov, J. Zico Kolter.
arXiv 2023. [PDF]

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation.
Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang.
arXiv 2023. [PDF]

Dense Text-to-Image Generation with Attention Modulation.
Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, Jun-Yan Zhu.
ICCV 2023. [PDF]

Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive.
Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva.
ICLR 2024. [PDF]

JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling.
Jingyang Zhang, Shiwei Li, Yuanxun Lu, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan, Yao Yao.
ICLR 2024. [PDF]

HyperHuman- Hyper-Realistic Human Generation with Latent Structural Diffusion.
Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov.
ICLR 2024. [PDF]

R&B: REGION AND BOUNDARY AWARE ZERO-SHOT GROUNDED TEXT-TO-IMAGE GENERATION.
Jiayu Xiao, Henglei Lv, Liang Li, Shuhui Wang, Qingming Huang.
arXiv 2023. [PDF]

Enhancing Object Coherence in Layout-to-Image Synthesis.
Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin.
arXiv 2023. [PDF]

An Image is Worth Multiple Words- Multi-attribute Inversion for Constrained Text-to-Image Synthesis.
Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, Balaji Vasan Srinivasan.
arXiv 2023. [PDF]

LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis.
Peiang Zhao, Han Li, Ruiyang Jin, S. Kevin Zhou.
arXiv 2023. [PDF]

AnyLens: A Generative Diffusion Model with Any Rendering Lens.
Andrey Voynov, Amir Hertz, Moab Arar, Shlomi Fruchter, Daniel Cohen-Or.
arXiv 2023. [PDF]

Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis.
Zipeng Qi, Guoxi Huang, Zebin Huang, Qin Guo, Jinwen Chen, Junyu Han, Jian Wang, Gang Zhang, Lufei Liu, Errui Ding, Jingdong Wang.
arXiv 2023. [PDF]

LOOSE CONTROL: Lifting ControlNet for Generalized Depth Conditioning.
Shariq Farooq Bhat, Niloy J. Mitra, Peter Wonka.
arXiv 2023. [PDF]

DemoCaricature: Democratising Caricature Generation with a Rough Sketch.
Dar-Yen Chen, Subhadeep Koley, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song.
arXiv 2023. [PDF]

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models.
Jiun Tian Hoe, Xudong Jiang, Chee Seng Chan, Yap-Peng Tan, Weipeng Hu.
arXiv 2023. [PDF]

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition.
Sicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, Bolei Zhou.
arXiv 2023. [PDF]

Local Conditional Controlling for Text-to-Image Diffusion Models.
Yibo Zhao, Liang Peng, Yang Yang, Zekai Luo, Hengjia Li, Yao Chen, Wei Zhao, qinglin lu, Boxi Wu, Wei Liu.
arXiv 2023. [PDF]

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing.
Zeyinzi Jiang, Chaojie Mao, Yulin Pan, Zhen Han, Jingfeng Zhang.
arXiv 2023. [PDF]

Towards Flexible, Scalable, and Adaptive Multi-Modal Conditioned Face Synthesis.
Jingjing Ren, Cheng Xu, Haoyu Chen, Xinran Qin, Chongyi Li, Lei Zhu.
arXiv 2023. [PDF]

Advanced Text-Conditioned Generation

TRAINING-FREE STRUCTURED DIFFUSION GUIDANCE FOR COMPOSITIONAL TEXT-TO-I MAGE SYNTHESIS.
Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang.
arXiv 2022. [PDF]

Attend-and-Excite- Attention-Based Semantic Guidance for Text-to-Image Diffusion Models.
Hila Chefer, Yuval Alaluf, Yael Vinker, Lior Wolf, Daniel Cohen-Or.
TOG 2023. [PDF]

Divide & Bind Your Attention for Improved Generative Semantic Nursing.
Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva.
BMVC 2023. [PDF]

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation.
Can Qin, Ning Yu, Chen Xing, Shu Zhang, Zeyuan Chen, Stefano Ermon, Yun Fu, Caiming Xiong, Ran Xu.
arXiv 2023. [PDF]

Expressive Text-to-Image Generation with Rich Text.
Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang.
ICCV 2023. [PDF]

Linguistic Binding in Diffusion Models- Enhancing Attribute Correspondence through Attention Map Alignment.
Royi Rassin, Eran Hirsch, Daniel Glickman, Shauli Ravfogel, Yoav Goldberg, Gal Chechik.
NeurIPS 2023. [PDF]

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting.
Zijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, Zhenzhong Lan.
arXiv 2023. [PDF]

Paragraph-to-Image Generation with Information-Enriched Diffusion Model.
Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang.
arXiv 2023. [PDF]

PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation.
Jian Ma, Chen Chen, Qingsong Xie, Haonan Lu.
arXiv 2023. [PDF]

In-Context Generation

In-Context Learning Unlocked for Diffusion Models.
Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou.
arXiv 2023. [PDF]

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts.
Tianqi Chen, Yongfei Liu, Zhendong Wang, Jianbo Yuan, Quanzeng You, Hongxia Yang, Mingyuan Zhou.
arXiv 2023. [PDF]

Brain-Guided Generation

Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding.
Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, Juan Helen Zhou.
CVPR 2023. [PDF]

High-Resolution Image Reconstruction With Latent Diffusion Models From Human Brain Activity.
Yu Takagi, Shinji Nishimoto.
CVPR 2023.

Natural.
Furkan Ozcelik, Rufin VanRullen.
Scientific Reports 2023. [PDF]

MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion.
Yizhuo Lu, Changde Du, Dianpeng Wang, Huiguang He.
ACM MM 2023. [PDF]

Natural Image Reconstruction from fMRI Based on Self-supervised Representation Learning and Latent Diffusion Model.
Pengyu Ni, Yifeng Zhang.
ACM MM 2023.

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals.
Yunpeng Bai, Xintao Wang, Yan-pei Cao, Yixiao Ge, Chun Yuan, Ying Shan.
arXiv 2023. [PDF]

BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction.
Honghao Fu, Zhiqi Shen, Jing Jih Chin, Hao Wang.
arXiv 2023. [PDF]

Sound-Guided Generation

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation.
Can Qin, Ning Yu, Chen Xing, Shu Zhang, Zeyuan Chen, Stefano Ermon, Yun Fu, Caiming Xiong, Ran Xu.
arXiv 2023. [PDF]

Align, Adapt and Inject: Sound-guided Unified Image Generation.
Yue Yang, Kaipeng Zhang, Yuying Ge, Wenqi Shao, Zeyue Xue, Yu Qiao, Ping Luo.
arXiv 2023. [PDF]

Text Rendering

Character-Aware Models Improve Visual Text Rendering.
Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant.
ACL 2022. [PDF]

GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image.
Jian Ma, Mingjun Zhao, Chen Chen, Ruichen Wang, Di Niu, Haonan Lu, Xiaodong Lin.
arXiv 2023. [PDF]

TextDiffuser: Diffusion Models as Text Painters.
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei.
arXiv 2023. [PDF]

GlyphControl: Glyph Conditional Control for Visual Text Generation.
Yukang Yang, Dongnan Gui, Yuhui Yuan, Weicong Liang, Haisong Ding, Han Hu, Kai Chen.
NeurIPS 2023. [PDF]

AnyText: Multilingual Visual Text Generation And Editing.
Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie.
arXiv 2023. [PDF]

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering.
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei.
arXiv 2023. [PDF]

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models.
Yiming Zhao, Zhouhui Lian.
arXiv 2023. [PDF]

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model.
Lingjun Zhang, Xinyuan Chen, Yaohui Wang, Yue Lu, Yu Qiao.
AAAI 2024. [PDF]

⭐Generation with Multiple Conditions

Joint Training

Composer: Creative and controllable image synthesis with composable conditions.
Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, Jingren Zhou.
ICML 2023. [PDF]

SVDiff- Compact Parameter Space for Diffusion Fine-Tuning.
Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, Feng Yang.
ICCV 2023. [PDF]

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention.
Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han.
arXiv 2023. [PDF]

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation.
Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham.
NeurIPS 2023. [PDF]

Continual Learning

Continual Diffusion- Continual Customization of Text-to-Image Diffusion with C-LoRA.
James Seale Smith, Yen-Chang Hsu, Lingyu Zhang, Ting Hua, Zsolt Kira, Yilin Shen, Hongxia Jin.
arXiv 2023. [PDF]

Create Your World: Lifelong Text-to-Image Diffusion.
Gan Sun, Wenqi Liang, Jiahua Dong, Jun Li, Zhengming Ding, Yang Cong.
arXiv 2023. [PDF]

Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters.
James Seale Smith, Yen-Chang Hsu, Zsolt Kira, Yilin Shen, Hongxia Jin.
arXiv 2023. [PDF]

Weight Fusion

Multi-Concept Customization of Text-to-Image Diffusion.
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu.
CVPR 2023. [PDF]

Cones- Concept Neurons in Diffusion Models for Customized Generation.
Zhiheng Liu, Ruili Feng, Kai Zhu, Yifei Zhang, Kecheng Zheng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao.
ICML 2023. [PDF]

Mix-of-Show- Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models.
Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou.
NeurIPS 2023. [PDF]

ZipLoRA- Any Subject in Any Style by Effectively Merging LoRAs.
Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani.
arXiv 2023. [PDF]

Orthogonal Adaptation for Modular Customization of Diffusion Models.
Ryan Po, Guandao Yang, Kfir Aberman, Gordon Wetzstein.
arXiv 2023. [PDF]

Attention-based Integration

Cones 2- Customizable Image Synthesis with Multiple Subjects .
Zhiheng Liu, Yifei Zhang, Yujun Shen, Kecheng Zheng, Kai Zhu, Ruili Feng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao.
arXiv 2023. [PDF]

Guidance Composition

Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models.
Luozhou Wang, Guibao Shen, Wenhang Ge, Guangyong Chen, Yijun Li, Ying-cong Chen.
arXiv 2023. [PDF]

High-fidelity Person-centric Subject-to-Image Synthesis.
Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin.
arXiv 2023. [PDF]

Concept-centric Personalization with Large-scale Diffusion Priors.
Pu Cao, Lu Yang, Feng Zhou, Tianrui Huang, Qing Song.
arXiv 2023. [PDF]

🔥Universal Controllable Generation

Universal Conditional Score Prediction

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models.
Sungnyun Kim, Junsoo Lee, Kibeom Hong, Daesik Kim, Namhyuk Ahn.
arXiv 2023. [PDF]

Generative Multimodal Models are In-Context Learners.
Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang.
arXiv 2023. [PDF]

Universal Condition-Guided Score Estimation

Universal Guidance for Diffusion Models.
Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta, Micah Goldblum, Jonas Geiping, Tom Goldstein.
CVPRW 2023. [PDF]

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model.
Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang.
arXiv 2023. [PDF]

Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method.
Jiachun Pan, Hanshu Yan, Jun Hao Liew, Jiashi Feng, Vincent Y. F. Tan.
arXiv 2023. [PDF]

Star History

Star History Chart

About

A collection of resources on controllable generation with text-to-image diffusion models.

License:MIT License