MzeroMiko / VMamba

VMamba: Visual State Space Models,code is based on mamba

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LayerNorm after the SS2D

ydhongHIT opened this issue · comments

Hi, is applying a LN layer after the SSM a default setting in Mamba? If not, is there any ablation experiments about the function of the LN layer?

Our original attempt for adding ln is to avoid collapsing in training. It seems that the output of s6 is often too large (then get inf and finally NaN) to be in data type float16.

Our original attempt for adding ln is to avoid collapsing in training. It seems that the output of s6 is often too large (then get inf and finally NaN) to be in data type float16.

Thank you for your reply. By the way, did you try the Layerscale? I guess it may help mitigate the overfitting of large models.

Thank you for your advice, we'll try it in the future.