yuweihao / MambaOut

MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)

Repository from Github https://github.comyuweihao/MambaOutRepository from Github https://github.comyuweihao/MambaOut

Question about Hypothesis 2

x2ss opened this issue · comments

Hi,

Thank you for sharing this insightful work.

I have a question about Hypothesis 2 in page 6 (arxiv version):

Hypothesis 2: It is still worthwhile to further explore the potential of SSM for visual
detection and segmentation since these tasks align with Characteristic 2, despite not fulfilling
Characteristic 1

• Characteristic 1: The task involves processing long sequences.
• Characteristic 2: The task requires causal token mixing mode.

However at the end of page 5 (arxiv version), the paper says that "both detection on COCO and segmentation on ADE20K can be considered long-sequence tasks"

So may the Hypothesis 2 should be "It is still worthwhile to further explore the potential of SSM for visual
detection and segmentation since these tasks align with Characteristic 1, despite not fulfilling
Characteristic 2" ?

If I have misunderstood anything, I appreciate you pointing that out. I look forward to your response.

Thanks and best!

Hi @x2ss , thank you so much for your attention to our work. Yes, this is a typo. Thanks for your reminder, I will correct it in the next version.

Hi @x2ss , thank you so much for your attention to our work. Yes, this is a typo. Thanks for your reminder, I will correct it in the next version.

thanks for your reply.

Hope the next version of the paper (maybe another paper) will have more discussion on detection and segmentation tasks, as many detection and segmentation tasks have performance gain by employing Mamba.

Thanks and best!