HRNet / HRNet-Semantic-Segmentation

The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How is multi-scale fusion performed?

kadattack opened this issue · comments

How exactly is is multi-scale fusion performed when combining all the outputs from different branches in to one?
I am asking about the process that happens AFTER the strided convolution and the upscaling is performed to get all of them the same size.
Does it do a simple element-wise sum of all the outputs in to one? Or does it concatenate the outputs in to different channels?
Capture

commented

You can see this in the forward pass of the HighResolutionNet module. After interpolation upsizing the resulting arrays are concatenated, and then passed through the last_layer submodule that consists of:

self.last_layer = nn.Sequential( nn.Conv2d( in_channels=last_inp_channels, out_channels=last_inp_channels, kernel_size=1, stride=1, padding=0), BatchNorm2d(last_inp_channels, momentum=BN_MOMENTUM), nn.ReLU(inplace=relu_inplace), nn.Conv2d( in_channels=last_inp_channels, out_channels=config["arch"]["num_classes"], kernel_size=extra["FINAL_CONV_KERNEL"], stride=1, padding=1 if extra["FINAL_CONV_KERNEL"] == 3 else 0) )

There's a final interpolation to enforce that the output size = input size.

You can see this in the forward pass of the HighResolutionNet module. After interpolation upsizing the resulting arrays are concatenated, and then passed through the last_layer submodule that consists of:

self.last_layer = nn.Sequential( nn.Conv2d( in_channels=last_inp_channels, out_channels=last_inp_channels, kernel_size=1, stride=1, padding=0), BatchNorm2d(last_inp_channels, momentum=BN_MOMENTUM), nn.ReLU(inplace=relu_inplace), nn.Conv2d( in_channels=last_inp_channels, out_channels=config["arch"]["num_classes"], kernel_size=extra["FINAL_CONV_KERNEL"], stride=1, padding=1 if extra["FINAL_CONV_KERNEL"] == 3 else 0) )

There's a final interpolation to enforce that the output size = input size.

I'm very new to AI and pytorch but isn't this the code for final output of the whole Hrnet? I don't know if we are thinking about the same thing. Just to reconfirm, I'm talking about the merge process that happens throughout the whole net.
image

From my understanding this is made in the function

def _make_fuse_layers(self):

, however i'm still not good enough to understand what happens at the end of the forward() function
y = y + self.fuse_layers[i][j](x[j])

It looks like it's adding the layers up with addition?
Am I looking at the wrong part of the code?