Masked Sinkhorn Matching

Question

Masked Sinkhorn Matching

Jingyu6 opened this issue a year ago · comments

Thanks for the nice work! I'm currently trying to adapt DROT to my project under the following setting: my rendered image has an rgb map + point map + the mask from the NVDiffrastRenderer and my reference image also comes with a rgb map and an additional mask. Now I'm not sure how we should do the SinkHorn matching with masked objects. Here's my current plan:

For the rgb maps for both rendered and gt images, we just set unmasked background to all 0. For xy map, we do the same as what you did (set background to the normal default values and copy everything to the reference image). Then we just run the sinkhorn divergence loss on the whole rgbxy map.

Is this how it should be implemented? Also if we only care about a subset of pixels (masked), can we run sinkhorn with only these two subsets to save for computation (i.e. loss = sinkhorn(rendered_point_5d[mask == True], target_point_5d[mask == True]))?

Also another question is, when we do SinkHorn matching, the target we ended up having is (render_point_5d -g.reshape(-1,h,w,5)).detach(), which we then used for the final loss as mse(match_point_5d, render_point_5d). This essentially just makes the gradient of sinkhorn g equal to 0. Is this equivalent to minimizing the sinkhorn divergence? And if so why do we do this extra step of computing the g instead of directly minimizing the sinkhorn loss? Is it there for the periodic update to matching for efficiency? How should we interpret this?

Thanks a lot for the help.

Best,
Jingyu