Clarification on keypoints handling in Match R-CNN
fColangelo opened this issue · comments
Hi, I was reading your paper and some question came up concerning the handling of landmark information in the Match R-CNN framework.
In this issue (#27) it appears that the landmark point estimation is handled as a pose estimation problem. If this is the case, from my understanding the complete set of concatenated keypoint (294) should be estimated for each image.
However, in this issue (#28), the landmark are said to be estimated as a one-hot encoded mask. In this case, my guess is that the network should have one plane for each class (since otherwise the output would be very high dimensional).
Could you please clarify how the landmark sub-network is trained?
Thanks!