TUI-NICR / ESANet

ESANet: Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is the difference between PPM and APPM?

amokame opened this issue · comments

Sorry for asking a lot questions...
Your work is fantistic! It works great on my own dataset.
I'm now reading the code and try to understand how some essential parts work.

I have a question about the APPM.
I'm little confused about the difference between the PPM and APPM, and why appm-1-2-4-8 on Cityscapes performs better results.
In the code, there is

bin_multiplier_h = int((h / h_inp) + 0.5)
bin_multiplier_w = int((w / w_inp) + 0.5)

I think that h and h_inp are the same, then the bin_multiplier_h is 1. So h_pool = bin.
Then it does a advative pooling, the result size is (bin, bin).
It seems the same with PPM.

Could you explain more about the difference? Thanks you very much.

PPM uses fixed output shapes in the context module branches, while APPM adapts the output shapes based on the shape of the actual input. Let me give a short example: Assume network training was done with 480x640 inputs, but inference should be done with 960x1280 inputs. In this scenario PPM adapts the pooling windows to be twice as large to create exact the same output shapes as during training, while APPM keeps the pooling windows fixed and, thus, doubles the spatial size of the feature maps in the branches.

So PPM keeps the number of context features fixed, whereas APPM fixes the areas within context features are aggregated.

Thank you so much for your explanation!
I didn't consider the situation that input images' size are different.
Understant it now. Thanks!