HRNet / HRNet-Semantic-Segmentation

How exactly is is multi-scale fusion performed when combining all the outputs from different branches in to one?
I am asking about the process that happens AFTER the strided convolution and the upscaling is performed to get all of them the same size.
Does it do a simple element-wise sum of all the outputs in to one? Or does it concatenate the outputs in to different channels?

You can see this in the forward pass of the HighResolutionNet module. After interpolation upsizing the resulting arrays are concatenated, and then passed through the last_layer submodule that consists of:

self.last_layer = nn.Sequential( nn.Conv2d( in_channels=last_inp_channels, out_channels=last_inp_channels, kernel_size=1, stride=1, padding=0), BatchNorm2d(last_inp_channels, momentum=BN_MOMENTUM), nn.ReLU(inplace=relu_inplace), nn.Conv2d( in_channels=last_inp_channels, out_channels=config["arch"]["num_classes"], kernel_size=extra["FINAL_CONV_KERNEL"], stride=1, padding=1 if extra["FINAL_CONV_KERNEL"] == 3 else 0) )

There's a final interpolation to enforce that the output size = input size.

You can see this in the forward pass of the HighResolutionNet module. After interpolation upsizing the resulting arrays are concatenated, and then passed through the last_layer submodule that consists of:

self.last_layer = nn.Sequential( nn.Conv2d( in_channels=last_inp_channels, out_channels=last_inp_channels, kernel_size=1, stride=1, padding=0), BatchNorm2d(last_inp_channels, momentum=BN_MOMENTUM), nn.ReLU(inplace=relu_inplace), nn.Conv2d( in_channels=last_inp_channels, out_channels=config["arch"]["num_classes"], kernel_size=extra["FINAL_CONV_KERNEL"], stride=1, padding=1 if extra["FINAL_CONV_KERNEL"] == 3 else 0) )

There's a final interpolation to enforce that the output size = input size.

I'm very new to AI and pytorch but isn't this the code for final output of the whole Hrnet? I don't know if we are thinking about the same thing. Just to reconfirm, I'm talking about the merge process that happens throughout the whole net.

From my understanding this is made in the function

HRNet-Semantic-Segmentation/lib/models/hrnet.py

Line 207 in f9fb1ba

def _make_fuse_layers(self):

, however i'm still not good enough to understand what happens at the end of the forward() function

HRNet-Semantic-Segmentation/lib/models/hrnet.py

Line 277 in f9fb1ba

y = y + self.fuse_layers[i][j](x[j])

It looks like it's adding the layers up with addition?
Am I looking at the wrong part of the code?

How is multi-scale fusion performed?