bowang-lab / U-Mamba

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

Home Page:https://arxiv.org/abs/2401.04722

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to avoid large number of sequence length?

IceClear opened this issue · comments

Hi, @JunMa11 . Thanks for your great work.
I have a small question related to the network setting.
Since the sequence length L is set to be the multiplication of C, H, W of the image patch according to the paper, then given an image patch such as 320x320, the C can be 32 if my understanding is correct according to the code, then L is 160x160x32=819.2K (after the first pooling) at the first scale of Unet which can be quite large.
Do I misunderstand some details? Or there are some strategies to avoid such a case?
Thanks again and look forward to your help :)

commented

Hi, @IceClear

We followed the common practice in vision transformer and there is a transpose operation. Thus, C is the lengh.

middle_feature_flat = middle_feature.view(B, C, n_tokens).transpose(-1, -2)