RoI head for Vision Transformer

Question

RoI head for Vision Transformer

yuxin212 opened this issue 2 years ago · comments

🚀 Feature

Please consider adding RoI head for Vision Transformer, which can be used for action detection using Vision Transformer.

Motivation

Performance of MViT on the AVA dataset is better than methods based on conv nets, like Slow/ResNet. But currently there are only implementation of RoI heads for Slow and SlowFast.

Pitch

A function/class similar to the ResNet RoI head, creates the RoI head for Vision Transformer.

Vivek Krishnamurthy · Answer 1 · Sun Oct 16 2022 02:27:55 GMT+0800 (China Standard Time)

Took a brief look at this.

I think we could use the RoI code found in here

There are some differences but I think it's a good starting point. Curious to know your thoughts!