dk-liang / CLTR

[ECCV 2022] An End-to-End Transformer Model for Crowd Localization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I want to ask about the output of your CLTR model

VietPT3502 opened this issue · comments

i see outputs['pred_logits'] which shape are (batch_size, num_queries, num_classes) but outputs['pred_points'] shape is (batch_size, num_queries, 3). What is that 3 stands for?. And does num_classes = 2 which is person head and background ?

# the `num_classes` naming here is somewhat misleading.
# it indeed corresponds to `max_obj_id + 1`, where max_obj_id
# is the maximum id for a class in your dataset. For example,
# COCO has a max_obj_id of 90, so we pass `num_classes` to be 91.
# As another example, for a dataset that has a single class with id 1,
# you should pass `num_classes` to be 2 (max_obj_id + 1).
# For more details on this, check the following discussion
# https://github.com/facebookresearch/detr/issues/108#issuecomment-650269223

The third number means the KNN distance