A curated list of resources including papers, datasets, and relevant links pertaining to generative image composition, which aims to generate plausible composite images based on a background image (optional bounding box) and a (resp., a few) foreground image (resp., images) of a specific object.
Contributions are welcome. If you wish to contribute, feel free to send a pull request. If you have suggestions for new sections to be included, please raise an issue and discuss before sending a pull request.
A brief review on generative image composition is included in the following survey on image composition:
Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang: "Making Images Real Again: A Comprehensive Survey on Deep Image Composition." arXiv preprint arXiv:2106.14490 (2021). [arXiv]
- COCOEE (within-domain, single-ref): 500 background images from MSCOCO validation set. Each background image has a bounding box and a foreground image from MSCOCO training set.
- TF-ICON test benchmark (cross-domain, single-ref): 332 samples. Each sample consists of a background image, a foreground image, a user mask, and a text prompt.
- FOSCom (within-domain, single-ref): 640 background images from Internet. Each background image has a manually annotated bounding box and a foreground image from MSCOCO training set.
- DreamEditBench (within-domain, multi-ref): 220 background images and 30 unique foreground objects from 15 categories.
- MureCom (within-domain, multi-ref): 640 background images and 96 unique foreground objects from 32 categories.
The training set is open. The test set is COCOEE benchmark.
Method | Foreground | Background | Overall | ||||
---|---|---|---|---|---|---|---|
CLIP↑ | DINO↑ | FID↓ | LSSIM↑ | LPIPS↓ | FID↓ | QS↑ | |
Inpaint&Paste | - | - | 8.0 | - | - | 3.64 | 72.07 |
SDEdit | 85.02 | - | 9.77 | 0.630 | 0.344 | 6.42 | 75.20 |
PBE | 84.84 | - | 6.24 | 0.823 | 0.116 | 3.18 | 77.80 |
ObjectStitch | 85.97 | - | 6.86 | 0.825 | 0.116 | 3.35 | 76.86 |
ControlCom | 88.31 | - | 6.28 | 0.826 | 0.114 | 3.19 | 77.84 |
- Vishnu Sarukkai, Linden Li, Arden Ma, Christopher Re, Kayvon Fatahalian: "Collage Diffusion." WACV (2024) [pdf] [code]
- Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan: "CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models." arXiv preprint arXiv:2310.19784 (2023) [arXiv] [code]
- Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, Li Niu: "ControlCom: Controllable Image Composition using Diffusion Model." arXiv preprint arXiv:2308.10040 (2023) [arXiv] [code] [demo]
- Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao: "AnyDoor: Zero-shot Object-level Image Customization." CVPR (2024) [arXiv] [code] [demo]
- Xin Zhang, Jiaxian Guo, Paul Yoo, Yutaka Matsuo, Yusuke Iwasawa: "Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model." arXiv preprint arXiv:2306.07596 (2023) [arXiv] [code]
- Roy Hachnochi, Mingrui Zhao, Nadav Orzech, Rinon Gal, Ali Mahdavi-Amiri, Daniel Cohen-Or, Amit Haim Bermano: "Cross-domain Compositing with Pretrained Diffusion Models." arXiv preprint arXiv:2302.10167 (2023) [arXiv] [code]
- Shilin Lu, Yanzhu Liu, Adams Wai-Kin Kong: "TF-ICON: Diffusion-based Training-free Cross-domain Image Composition." ICCV (2023) [pdf] [code]
- Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen: "Paint by Example: Exemplar-based Image Editing with Diffusion Models." CVPR (2023) [arXiv] [code] [demo]
- Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, Daniel Aliaga: "ObjectStitch: Generative Object Compositing." CVPR (2023) [arXiv] [code]
- Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh: "Putting People in Their Place: Affordance-Aware Human Insertion into Scenes." CVPR (2023) [paper] [code]
-
Lingxiao Lu, Bo Zhang, Li Niu: "DreamCom: Finetuning Text-guided Inpainting Model for Image Composition." arXiv preprint arXiv:2309.15508 (2023) [arXiv] [code]
-
Tianle Li, Max Ku, Cong Wei, Wenhu Chen: "DreamEdit: Subject-driven Image Editing." TMLR (2023) [arXiv] [code]
- Jinghao Zhou, Tomas Jakab, Philip Torr, Christian Rupprecht: "Scene-Conditional 3D Object Stylization and Composition." arXiv preprint arXiv:2312.12419 (2023) [arXiv] [code]
- Mohamad Shahbazi, Liesbeth Claessens, Michael Niemeyer, Edo Collins, Alessio Tonioni, Luc Van Gool, Federico Tombari: "InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes." arXiv preprint arXiv:2401.05335 (2024) [arXiv]
- Rahul Goel, Dhawal Sirikonda, Saurabh Saini, PJ Narayanan: "Interactive Segmentation of Radiance Fields." CVPR (2023) [arXiv] [code]
- Rahul Goel, Dhawal Sirikonda, Rajvi Shah, PJ Narayanan: "FusedRF: Fusing Multiple Radiance Fields." CVPR Workshop (2023) [arXiv]
- Verica Lazova, Vladimir Guzov, Kyle Olszewski, Sergey Tulyakov, Gerard Pons-Moll: "Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation." WACV (2023) [arXiv]
- Jiaxiang Tang, Xiaokang Chen, Jingbo Wang, Gang Zeng: "Compressible-composable NeRF via Rank-residual Decomposition." NIPS (2022) [arXiv] [code]
- Bangbang Yang, Yinda Zhang, Yinghao Xu, Yijin Li, Han Zhou, Hujun Bao, Guofeng Zhang, Zhaopeng Cui: "Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering." ICCV (2021) [arXiv] [code]
- Boxiao Pan, Zhan Xu, Chun-Hao Paul Huang, Krishna Kumar Singh, Yang Zhou, Leonidas J. Guibas, Jimei Yang: "ActAnywhere: Subject-Aware Video Background Generation." arXiv preprint arXiv:2401.10822 (2024) [arXiv]