fcjian / TOOD

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

a little puzzled about the T-Head module

Xinbo-01 opened this issue · comments

When reading your paper, I was a little puzzled about the T-Head module, and I hope to get your answer.
Why can "N consecutive conv layers" extract the task-interactive features?Compared with it, does the feature extracted by the previous backbone+FPN have no interactive information?

We hold the view that the closer to the prediction layer, the richer the classification and localization information. In our method, the features extracted by the N consecutive conv layers are used to predict both the classification and localization directly. Therefore, the features extracted by the N consecutive conv layers have richer classification and localization information for task-interaction, than the feature extracted by the previous backbone+FPN.