mmaaz60 / mdef_detr

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MDef-DETR: Multi-modal Deformable Detection Transformer

The repository contains the training code of MDef-DETR. The paper is availe on arxiv.

Requirements

pip install -r requirements.txt

Training

Distributed training is available via Slurm and submitit:

pip install submitit

The config file for pretraining is configs/pretrain.json and looks like:

{
    "combine_datasets": ["flickr", "mixed"],
    "combine_datasets_val": ["flickr", "gqa", "refexp"],
    "coco_path": "",
    "vg_img_path": "",
    "flickr_img_path": "",
    "refexp_ann_path": "annotations/",
    "flickr_ann_path": "annotations/",
    "gqa_ann_path": "annotations/",
    "refexp_dataset_name": "all",
    "GT_type": "separate",
    "flickr_dataset_path": ""
}
  • Download the original Flickr30k image dataset from : Flickr30K webpage and update the flickr_img_path to the folder containing the images.
  • Download the original Flickr30k entities annotations from: Flickr30k annotations and update the flickr_dataset_path to the folder with annotations.
  • Download the gqa images at GQA images and update vg_img_path to point to the folder containing the images.
  • Download COCO images Coco train2014. Update the coco_path to the folder containing the downloaded images.
  • Download pre-processed annotations that are converted to coco format (all datasets present in the same zip folder for MDETR annotations): Pre-processed annotations and update the flickr_ann_path, gqa_ann_path and refexp_ann_path to this folder with pre-processed annotations.

Alternatively, you can download the preprocessed data from the link as a single zip file and extract it under data directory.

Script to run training

This command will reproduce the training of the resnet 101.

python run_with_submitit.py --dataset_config configs/pretrain.json  --ngpus 8 --nodes 4 --ema --epochs 20 --lr_drop 16

Citation

If you use our work, please consider citing MDef-DETR:

    @article{Maaz2021Multimodal,
        title={Multi-modal Transformers Excel at Class-agnostic Object Detection},
        author={Muhammad Maaz and Hanoona Rasheed and Salman Khan and Fahad Shahbaz Khan and Rao Muhammad Anwer and Ming-Hsuan Yang},
        journal={ArXiv 2111.11430},
        year={2021}
    }

Credits

This codebase is modified from the MDETR repository. We thank them for their implementation.

About

License:Apache License 2.0


Languages

Language:Python 91.3%Language:Cuda 7.6%Language:C++ 0.8%Language:Shell 0.2%Language:Dockerfile 0.1%