a little puzzled about the T-Head module

Question

a little puzzled about the T-Head module

Xinbo-01 opened this issue 3 years ago · comments

When reading your paper, I was a little puzzled about the T-Head module, and I hope to get your answer.
Why can "N consecutive conv layers" extract the task-interactive features?Compared with it, does the feature extracted by the previous backbone+FPN have no interactive information?

jianpursuit · Answer 1 · Thu Oct 28 2021 16:03:47 GMT+0800 (China Standard Time)

We hold the view that the closer to the prediction layer, the richer the classification and localization information. In our method, the features extracted by the N consecutive conv layers are used to predict both the classification and localization directly. Therefore, the features extracted by the N consecutive conv layers have richer classification and localization information for task-interaction, than the feature extracted by the previous backbone+FPN.