Can you explain more about dealing with different input modalites channels?

Question

Can you explain more about dealing with different input modalites channels?

chengyu89527 opened this issue a year ago · comments

It's great work. However I have some questions .
You mention "To accept the multi-modality inputs, we reform the first convolution layer of the model and set up three different convolution layers to handle the input with one, two, or four channels" .In my opinion it means that first layer and will be different in each different modalites number dataset case ,you replace first layer from these 3 different layers according input case.But Universial model aims at dealing all tasks with one model. How many channels you set up for nnUnet backbone ? How do you handle the different channel input, why not fix it 4 and replace lacking modalites as 0 ? Is Table 1,2 tested by one univerial model with all pretrain tasks?
Thanks very much!

Yiwen Ye · Answer 1 · Fri Aug 18 2023 19:21:56 GMT+0800 (China Standard Time)

(1) UniSeg is designed with three layers, each tailored to handle inputs of 1, 2, or 4 channels. This configuration does not compromise the objectives of a Universal model, especially considering the majority of the parameters are shared across tasks.
(2) When managing inputs of varying channel counts, another strategy might be to duplicate input images to compensate for absent channels. Your suggestion might not always be ideal, given the inherent challenges in assigning a distinct modality to each channel.
(3) Table 1 provides detailed information on all the datasets utilized in the study. Meanwhile, Table 2 distinguishes between Single-task Models and the Universal Model: the former are both trained and tested on individual datasets, whereas the Universal Model undergoes joint training on all upstream datasets before being tested on all test datasets.

chengyu89527 · Answer 2 · Fri Aug 18 2023 19:29:15 GMT+0800 (China Standard Time)

I'm not sure how to understand "UniSeg is designed with three layers, each tailored to handle inputs of 1, 2, or 4 channels", Should you figure out where is 3 layers in Fig.2

chengyu89527 · Answer 3 · Fri Aug 18 2023 19:32:06 GMT+0800 (China Standard Time)

Are they parallelled tree single layer for 3 kinds input. or tree tree deep layer before vision encoder

Yiwen Ye · Answer 4 · Fri Aug 18 2023 19:36:41 GMT+0800 (China Standard Time)

Three parallel layers are designed to accept inputs with different channels, respectively.

chengyu89527 · Answer 5 · Fri Aug 18 2023 19:39:22 GMT+0800 (China Standard Time)

Each branch has 3 conv layers,right ? Thanks

Yiwen Ye · Answer 6 · Fri Aug 18 2023 19:44:58 GMT+0800 (China Standard Time)

Each branch is equipped with a single layer. To gain a better understanding of our design, I would recommend you read our code.

chengyu89527 · Answer 7 · Fri Aug 18 2023 19:45:57 GMT+0800 (China Standard Time)

I'm checking now Thanks

chengyu89527 · Answer 8 · Fri Aug 18 2023 22:57:20 GMT+0800 (China Standard Time)

不好意思兄弟，我突然想起来一个问题，如果输入的时候就是ct,t1,t2,t1c,t2f,pet六个通道的话（没有就空着），会有什么不好的效果么？

Yiwen Ye · Answer 9 · Sat Aug 19 2023 19:24:21 GMT+0800 (China Standard Time)

没尝试过啊，大兄弟，你可以试试。我感觉这样也不会有问题，就是提取的信息冗余的会比较多。