MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

Jiahao Xie¹ Wei Li¹ Xiangtai Li¹ Ziwei Liu¹ Yew Soon Ong² Chen Change Loy¹

¹S-Lab, ²Nanyang Technological University

• [arXiv] •

We present MosaicFusion, a general diffusion-based data augmentation pipeline for large-vocabulary instance segmentation. The MosaicFusion-synthesized instance segmentation dataset can be used to train various downstream detection and segmentation models to improve their performances, especially for rare and novel categories.

🤩 Key Properties

Training-free

Directly generate multiple objects

Agnostic to detection architectures

Without extra detectors or segmentors

😎 Method

MosaicFusion is a training-free diffusion-based dataset augmentation pipeline that can produce image and mask pairs with multiple objects simultaneously using the off-the-shelf text-to-image diffusion models. The overall pipeline of MosaicFusion consists of two components: image generation and mask generation.

🥰 Qualitative Examples

Given only interest category names, MosaicFusion can generate high-quality multi-object images and masks simultaneously by conditioning on a specific text prompt for each region.

🤟 Citation

If you find this work useful for your research, please consider citing our paper:

@article{xie2023mosaicfusion,
  author = {Xie, Jiahao and Li, Wei and Li, Xiangtai and Liu, Ziwei and Ong, Yew Soon and Loy, Chen Change},
  title = {MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation},
  journal = {arXiv preprint arXiv:2309.13042},
  year = {2023}
}

🗞️ License

Distributed under the S-Lab License. See LICENSE for more information.

About

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

Other