Some questions about the Match-Net.

Question

xwjabc opened this issue 5 years ago · comments

In the retrieval task, the paper mentions that 12 epochs are used for training. I wonder what is the definition of 1 epoch. Does it mean all image pairs? (i.e. 337,293 image pairs in #30)
In #31 (also in #17), the details of the network show that a fixed number (i.e. 8) of proposals is used for each image in match net during training. However, sometimes the number of possible proposals is less than 8. In #31, the number of proposals is always 8. I wonder if some augmented proposals are used.
If we use the mask features (after RoIAlign) for match net, the spatial resolution is 14x14 right? How to combine the bbox (spatial resolution 7x7) and mask (spatial resolution 14x14) RoI features?
It is possible to get the coefficient for the loss terms?