mmaaz60 / mvits_for_class_agnostic_od

[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possibility to Change the Text Encoder?

mcairlangga2 opened this issue · comments

Dear Authors,
Thank you for the great work.

I want to ask a question. Is it possible to change the text encoder to other models such as CLIP instead of using RoBERTa? Have you considered and tried another Text Encoder? If it's possible how to change it in the code?

Thank you!

Hi @mcairlangga2,

Yes, it is possible however we did not consider this research direction. With the recent advancements in NLP & Vision-Language modeling, it will be a worth exploring problem.

As per the implementation is concerned, it would be replacing RoberTa at this file with CLIP or other text decoder. Good Luck