[TokenFusion] What is the meaning of x[0] and x[1] in TokenFusion segmentation?
CaffreyR opened this issue · comments
Guorun commented
Hi, many thanks for your work. When I try to reproduce your code. In your forward there are 4 stages, in each stage, you use this code
x, H, W = self.patch_embed4(x)
for i, blk in enumerate(self.block4):
score = self.score_predictor[3](x)
mask = [F.softmax(score_.reshape(B, -1, 2), dim=2)[:, :, 0] for score_ in score] # mask_: [B, N]
masks.append(mask)
x = blk(x, H, W, mask)
x = self.norm4(x)
x = [x_.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous() for x_ in x]
outs0.append(x[0])
outs1.append(x[1])
Do x[0] and x[1] refer to RGB and Depth input? Then when does tokenfusion take place?
Many thanks!
Xinghao Chen commented
-
x[0] and x[1] refer to the output of RGB and Depth stream.
-
The TokenFusion operation takes place in https://github.com/huawei-noah/noah-research/blob/master/TokenFusion/semantic_segmentation/models/mix_transformer.py#L122