haoranD / MosaicFusion

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

1S-Lab, 2Nanyang Technological University

[arXiv]

We present MosaicFusion, a general diffusion-based data augmentation pipeline for large-vocabulary instance segmentation. The MosaicFusion-synthesized instance segmentation dataset can be used to train various downstream detection and segmentation models to improve their performances, especially for rare and novel categories.

🤩 Key Properties

  • Training-free
  • Directly generate multiple objects
  • Agnostic to detection architectures
  • Without extra detectors or segmentors

  • 😎 Method

    MosaicFusion is a training-free diffusion-based dataset augmentation pipeline that can produce image and mask pairs with multiple objects simultaneously using the off-the-shelf text-to-image diffusion models. The overall pipeline of MosaicFusion consists of two components: image generation and mask generation.

    🥰 Qualitative Examples

    Given only interest category names, MosaicFusion can generate high-quality multi-object images and masks simultaneously by conditioning on a specific text prompt for each region.

    🤟 Citation

    If you find this work useful for your research, please consider citing our paper:

    @article{xie2023mosaicfusion,
      author = {Xie, Jiahao and Li, Wei and Li, Xiangtai and Liu, Ziwei and Ong, Yew Soon and Loy, Chen Change},
      title = {MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation},
      journal = {arXiv preprint arXiv:2309.13042},
      year = {2023}
    }

    🗞️ License

    Distributed under the S-Lab License. See LICENSE for more information.

    About

    MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

    License:Other