a little puzzled about the T-Head module
Xinbo-01 opened this issue · comments
When reading your paper, I was a little puzzled about the T-Head module, and I hope to get your answer.
Why can "N consecutive conv layers" extract the task-interactive features?Compared with it, does the feature extracted by the previous backbone+FPN have no interactive information?
We hold the view that the closer to the prediction layer, the richer the classification and localization information. In our method, the features extracted by the N consecutive conv layers are used to predict both the classification and localization directly. Therefore, the features extracted by the N consecutive conv layers have richer classification and localization information for task-interaction, than the feature extracted by the previous backbone+FPN.