yeerwen / UniSeg

MICCAI 2023 Paper (Early Acceptance)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can you explain more about dealing with different input modalites channels?

chengyu89527 opened this issue · comments

It's great work. However I have some questions .
You mention "To accept the multi-modality inputs, we reform the first convolution layer of the model and set up three different convolution layers to handle the input with one, two, or four channels" .In my opinion it means that first layer and will be different in each different modalites number dataset case ,you replace first layer from these 3 different layers according input case.But Universial model aims at dealing all tasks with one model. How many channels you set up for nnUnet backbone ? How do you handle the different channel input, why not fix it 4 and replace lacking modalites as 0 ? Is Table 1,2 tested by one univerial model with all pretrain tasks?
Thanks very much!

(1) UniSeg is designed with three layers, each tailored to handle inputs of 1, 2, or 4 channels. This configuration does not compromise the objectives of a Universal model, especially considering the majority of the parameters are shared across tasks.
(2) When managing inputs of varying channel counts, another strategy might be to duplicate input images to compensate for absent channels. Your suggestion might not always be ideal, given the inherent challenges in assigning a distinct modality to each channel.
(3) Table 1 provides detailed information on all the datasets utilized in the study. Meanwhile, Table 2 distinguishes between Single-task Models and the Universal Model: the former are both trained and tested on individual datasets, whereas the Universal Model undergoes joint training on all upstream datasets before being tested on all test datasets.

I'm not sure how to understand "UniSeg is designed with three layers, each tailored to handle inputs of 1, 2, or 4 channels", Should you figure out where is 3 layers in Fig.2

Are they parallelled tree single layer for 3 kinds input. or tree tree deep layer before vision encoder

Three parallel layers are designed to accept inputs with different channels, respectively.

Each branch has 3 conv layers,right ? Thanks

Each branch is equipped with a single layer. To gain a better understanding of our design, I would recommend you read our code.

I'm checking now Thanks

不好意思兄弟,我突然想起来一个问题,如果输入的时候就是ct,t1,t2,t1c,t2f,pet六个通道的话(没有就空着),会有什么不好的效果么?

没尝试过啊,大兄弟,你可以试试。我感觉这样也不会有问题,就是提取的信息冗余的会比较多。