dk-liang / CLTR

[ECCV 2022] An End-to-End Transformer Model for Crowd Localization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to understand Object Queries in Crowd Counting task?

congyi-lcy opened this issue · comments

commented

I would like to ask the author about the explanation of object queries.

From DETR, we know that object queries, that is, the decoder of the transformer, will generate N predictions at one time. Among them, N is a pre-set integer that is at least greater than the number of objects in the picture, and then this N is the value of object queries.

However, in the crowd counting task here, there are often thousands or tens of thousands of people in a picture. I saw that the author set the object queries to 700 or 500. How do I understand this?
If object queries are defined in DETR, should it be set to a value of several thousand? But it feels so strange, can the author share his understanding of object queries? I would appreciate it.

Hi, the way I see it that the code actually crop the images into 12 crops each have the size of 256x256. so basically any image you have will be treated as a batch of images. each crop should hold your assumption and the way the queries work. i.e. in each crop you can at most predict 700/500 people.

Thanks for the clarification

I believe you haven't carefully reviewed the code. During the training process, the author requires that the number of GT points in the cropped 256x256 images should be greater than 0 and less than 500; otherwise, it should be re-cropped. In the testing process, although the author has padded the sides that cannot be evenly divided by 256 and then cropped the images into multiple 256x256 patches, each patch still queries for a maximum number of 500. I think there is a problem with this. @Faisal-Hajari Can you explain that? @dk-liang

The query number is a hyperparameter. Actually, we find that nearly all cropped patches contain less than 500 people. It can cover most dense cases. Also, we think a promising direction is to design the dynamic number query.

The query number is a hyperparameter. Actually, we find that nearly all cropped patches contain less than 500 people. It can cover most dense cases. Also, we think a promising direction is to design the dynamic number query.

谢谢 明白了