XiaLiPKU / EMANet

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

Home Page:https://xialipku.github.io/publication/expectation-maximization-attention-networks-for-semantic-segmentation/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

selection of K

kwea123 opened this issue · comments

commented

In my opinion, besides T, the selection of K is also important (like in GMM or k-means). I didn't see any ablation study on the effect of different K's, did you do some experiments?

Intuitively, I have the impression that mu represents different features for different classes, so the first K I would try is the number of classes (e.g. 19 for Cityscapes). Can you explain how you decide to use K=64?

As the visualization of responsibility shows, different z's tend to represent different classes, so won't it happen that having K>number of class makes some z's be actually close to each other, making them eventually redundant?

Thanks.

Yeah, K is of significance, because it just represents the rank for the reconstruction. Recently, I have done some comparisons on it and found 16, 32, 64, 128 just led to similar performance (though larger is better).

For the number of 64, I just selected it to keep the whole FLOPs similar to a 1x1 conv. More to say, 19 for Cityscapes may not be enough, as pixels belonging to the same class may have several feature clusters. Larger K shall introduce redundancy, but still, can cover more corner cases.

As K is just the 'rank', you may consider it from the low-rank matrix reconstruction perspective. Larger K leads to better reconstruction, while smaller K leads to less noise.