Clarification on keypoints handling in Match R-CNN

Question

Clarification on keypoints handling in Match R-CNN

fColangelo opened this issue 3 years ago · comments

Hi, I was reading your paper and some question came up concerning the handling of landmark information in the Match R-CNN framework.
In this issue (#27) it appears that the landmark point estimation is handled as a pose estimation problem. If this is the case, from my understanding the complete set of concatenated keypoint (294) should be estimated for each image.
However, in this issue (#28), the landmark are said to be estimated as a one-hot encoded mask. In this case, my guess is that the network should have one plane for each class (since otherwise the output would be very high dimensional).
Could you please clarify how the landmark sub-network is trained?
Thanks!