rayson-chan / Awesome-Generative-Image-Composition

A curated list of papers, code, and resources pertaining to generative image composition.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Generative Image Composition Awesome

A curated list of resources including papers, datasets, and relevant links pertaining to generative image composition, which aims to generate plausible composite images based on a background image (optional bounding box) and a (resp., a few) foreground image (resp., images) of a specific object.

Contributing

Contributions are welcome. If you wish to contribute, feel free to send a pull request. If you have suggestions for new sections to be included, please raise an issue and discuss before sending a pull request.

Table of Contents

Survey

A brief review on generative image composition is included in the following survey on image composition:

Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang: "Making Images Real Again: A Comprehensive Survey on Deep Image Composition." arXiv preprint arXiv:2106.14490 (2021). [arXiv]

Evaluation Metrics

Test Set

  • COCOEE (within-domain, single-ref): 500 background images from MSCOCO validation set. Each background image has a bounding box and a foreground image from MSCOCO training set.
  • TF-ICON test benchmark (cross-domain, single-ref): 332 samples. Each sample consists of a background image, a foreground image, a user mask, and a text prompt.
  • FOSCom (within-domain, single-ref): 640 background images from Internet. Each background image has a manually annotated bounding box and a foreground image from MSCOCO training set.
  • DreamEditBench (within-domain, multi-ref): 220 background images and 30 unique foreground objects from 15 categories.
  • MureCom (within-domain, multi-ref): 640 background images and 96 unique foreground objects from 32 categories.

Leaderboard

The training set is open. The test set is COCOEE benchmark.

Method Foreground Background Overall
CLIP↑ DINO↑ FID↓ LSSIM↑ LPIPS↓ FID↓ QS↑
Inpaint&Paste - - 8.0 - - 3.64 72.07
SDEdit 85.02 - 9.77 0.630 0.344 6.42 75.20
PBE 84.84 - 6.24 0.823 0.116 3.18 77.80
ObjectStitch 85.97 - 6.86 0.825 0.116 3.35 76.86
ControlCom 88.31 - 6.28 0.826 0.114 3.19 77.84

Papers

Object-to-Object

  • Vishnu Sarukkai, Linden Li, Arden Ma, Christopher Re, Kayvon Fatahalian: "Collage Diffusion." WACV (2024) [pdf] [code]
  • Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan: "CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models." arXiv preprint arXiv:2310.19784 (2023) [arXiv] [code]
  • Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, Li Niu: "ControlCom: Controllable Image Composition using Diffusion Model." arXiv preprint arXiv:2308.10040 (2023) [arXiv] [code] [demo]
  • Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao: "AnyDoor: Zero-shot Object-level Image Customization." CVPR (2024) [arXiv] [code] [demo]
  • Xin Zhang, Jiaxian Guo, Paul Yoo, Yutaka Matsuo, Yusuke Iwasawa: "Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model." arXiv preprint arXiv:2306.07596 (2023) [arXiv] [code]
  • Roy Hachnochi, Mingrui Zhao, Nadav Orzech, Rinon Gal, Ali Mahdavi-Amiri, Daniel Cohen-Or, Amit Haim Bermano: "Cross-domain Compositing with Pretrained Diffusion Models." arXiv preprint arXiv:2302.10167 (2023) [arXiv] [code]
  • Shilin Lu, Yanzhu Liu, Adams Wai-Kin Kong: "TF-ICON: Diffusion-based Training-free Cross-domain Image Composition." ICCV (2023) [pdf] [code]
  • Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen: "Paint by Example: Exemplar-based Image Editing with Diffusion Models." CVPR (2023) [arXiv] [code] [demo]
  • Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, Daniel Aliaga: "ObjectStitch: Generative Object Compositing." CVPR (2023) [arXiv] [code]
  • Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh: "Putting People in Their Place: Affordance-Aware Human Insertion into Scenes." CVPR (2023) [paper] [code]

Token-to-Object

  • Lingxiao Lu, Bo Zhang, Li Niu: "DreamCom: Finetuning Text-guided Inpainting Model for Image Composition." arXiv preprint arXiv:2309.15508 (2023) [arXiv] [code]

  • Tianle Li, Max Ku, Cong Wei, Wenhu Chen: "DreamEdit: Subject-driven Image Editing." TMLR (2023) [arXiv] [code]

Related Topics

Foreground: 3D; Background: image

  • Jinghao Zhou, Tomas Jakab, Philip Torr, Christian Rupprecht: "Scene-Conditional 3D Object Stylization and Composition." arXiv preprint arXiv:2312.12419 (2023) [arXiv] [code]

Foreground: 3D; Background: 3D

  • Mohamad Shahbazi, Liesbeth Claessens, Michael Niemeyer, Edo Collins, Alessio Tonioni, Luc Van Gool, Federico Tombari: "InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes." arXiv preprint arXiv:2401.05335 (2024) [arXiv]
  • Rahul Goel, Dhawal Sirikonda, Saurabh Saini, PJ Narayanan: "Interactive Segmentation of Radiance Fields." CVPR (2023) [arXiv] [code]
  • Rahul Goel, Dhawal Sirikonda, Rajvi Shah, PJ Narayanan: "FusedRF: Fusing Multiple Radiance Fields." CVPR Workshop (2023) [arXiv]
  • Verica Lazova, Vladimir Guzov, Kyle Olszewski, Sergey Tulyakov, Gerard Pons-Moll: "Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation." WACV (2023) [arXiv]
  • Jiaxiang Tang, Xiaokang Chen, Jingbo Wang, Gang Zeng: "Compressible-composable NeRF via Rank-residual Decomposition." NIPS (2022) [arXiv] [code]
  • Bangbang Yang, Yinda Zhang, Yinghao Xu, Yijin Li, Han Zhou, Hujun Bao, Guofeng Zhang, Zhaopeng Cui: "Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering." ICCV (2021) [arXiv] [code]

Foreground: video; Background: image

  • Boxiao Pan, Zhan Xu, Chun-Hao Paul Huang, Krishna Kumar Singh, Yang Zhou, Leonidas J. Guibas, Jimei Yang: "ActAnywhere: Subject-Aware Video Background Generation." arXiv preprint arXiv:2401.10822 (2024) [arXiv]

Foreground: video; Background: video

  • Jiaqi Guo, Sitong Su, Junchen Zhu, Lianli Gao, Jingkuan Song: "Training-Free Semantic Video Composition via Pre-trained Diffusion Model." arXiv preprint arXiv:2401.09195 (2024) [arXiv]

  • Donghoon Lee, Tomas Pfister, Ming-Hsuan Yang: "Inserting Videos into Videos." CVPR (2019) [pdf]

Other Resources

About

A curated list of papers, code, and resources pertaining to generative image composition.