mmaaz60 / mvits_for_class_agnostic_od

[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How long the training takes?

lxtGH opened this issue · comments

HI! @mmaaz60 Thanks for your opensource this project. I wonder how long the training procedure takes? What kind of device you yse?

Hi @lxtGH,

Thank you for your interest in our work. We train MAVL model on approx. 1.3 M aligned image-text pairs. The dataset is the same as used in MDETR. The training took almost 3 days on 32 V100 GPUs. Following this the model is evaluated on diverse datasets (Table 1 & 2) for class-agnostic object detection without any finetuning.

Thanks for your reply.