About the implementation details of using refinemask head in Cascade R-CNN or HTC

Question

About the implementation details of using refinemask head in Cascade R-CNN or HTC

zhengye1995 opened this issue 3 years ago · comments

zhengye commented 3 years ago

Thanks for your insightful work, I try to add the refinemask head into Cascade R-CNN and HTC and I have two questions:

In the inference process of Cascade R-CNN or HTC, the mask pred are the average result of the mask in each stage (3 stage in Cascade). So, if I only add the refinemask head in the last stage, the output size of the mask preds for 3 cascade stages will be: [28, 28 and 112], which cannot be averaged directly. What should I do to obtain the same size of mask results in all stages, upsampling the 28 to 112 or downsampling the 112 to 28, or just use the mask results in the last cascade stage.
There are both semantic head in HTC and your refinemak head. The most important difference is that the input feature level used is different, HTC uses the penultimate level feature as the input and refinemask adopts the frist one, is it possible to unify these two semantic heads?

Happy Spring Festival!

Gang Zhang · Answer 1 · Sat Feb 12 2022 11:30:06 GMT+0800 (China Standard Time)

Thanks for your insightful work, I try to add the refinemask head into Cascade R-CNN and HTC and I have two questions:

In the inference process of Cascade R-CNN or HTC, the mask pred are the average result of the mask in each stage (3 stage in Cascade). So, if I only add the refinemask head in the last stage, the output size of the mask preds for 3 cascade stages will be: [28, 28 and 112], which cannot be averaged directly. What should I do to obtain the same size of mask results in all stages, upsampling the 28 to 112 or downsampling the 112 to 28, or just use the mask results in the last cascade stage.

There are both semantic head in HTC and your refinemak head. The most important difference is that the input feature level used is different, HTC uses the penultimate level feature as the input and refinemask adopts the frist one, is it possible to unify these two semantic heads?

Happy Spring Festival!

I did not try any strategies you mentioned above, you can just try them. In the paper (the results submitted to the LVIS Challenge), we apply RefineMask into all three stages of HTC, specifically, the first two stages only have a maximum output size of 28x28 and the last stage has an output size of 112x112. During inference, feed the average (intermediate) outputs of size 28x28 from all three stages to the rest parts of the last stage in HTC.
Refinemask needs high-resolution features as input of the Semantic Head to make precise mask prediction. You can try to unify these two semantic heads. Good luck to you.

zhengye · Answer 2 · Sat Feb 12 2022 11:42:07 GMT+0800 (China Standard Time)

Thanks for your reply, I got it.