Brummi / MonoRec

Official implementation of the paper: MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera (CVPR 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mask Module Cost Volume Aggregation Method

fengziyue opened this issue · comments

Hi @Brummi and @nynyg:

Thank you for sharing the great work!
I'm curious about why you choose max-pooling to combine the multiple cost volumes for the mask module, is it only because the max-pooling works for arbitrary input frames numbers? or do you have more theoretical analysis or assumptions? have you tried other aggregation methods like SUM, AVG, CONCAT, etc..?

Thank you again!

Hi!

The reason we chose Max pooling are three-fold:

  1. It works on any number of frames
  2. The idea is that the mask module encodes the single frame CV in a feature vector. When we have inconsistency, we will have different feature activations for the different single frame CVs. By Max-Pooling, we keep the different activations, so that they can be picked up by the decoder. Sum, Avg, etc. would reduce the difference, which we don't want.
  3. Max Pooling has often been used for similar tasks in the literature.