关于运行速度
chenzean opened this issue · comments
作者,你好,
我想问一下关于mamba中使用mamba-ssm库中的SS2D运行速度超级慢(快一分钟了也没有迭代一个)。是否可以采用作者你的代码进行加速呢?
此外,想问一下作者关于csms6s.py中存在三个函数即SelectiveScanOflex和SelectiveScanCore以及SelectiveScanMamba。我想问一下作者这三者之间有什么区别?似乎我无法查看selective_scan_cuda这种函数。
when ss2d is extremely slow, it is often the case that your environment has something wrong. selective_scan_oflex is slightly faster only in our cases (i.e. SS2D).
SelectiveScanCore
is a simplified version of SelectiveScanMamba
, which delete some features from SelectiveScanMamba
.
SelectiveScanOflex
supports input float16 and output float32, it is faster compared to input float32 and output float32, and also almost as stable as the latter does. While input float16 and output float16 often leads to NaN in many cases.
作者,您说:when ss2d is extremely slow, it is often the case that your environment has something wrong.想问一下要如何查看自己的环境是否出现问题。因为它可以run SS2D所以不知道如何去查看错误。(使用作者你写的文件结合我自己的任务只需要运行2个小时,而使用官方实现的则是很长很长时间)
此外,我想问一下如果我想修改扫描路径,而且想提高运行速度是不是要修改csm_triton.py这个文件。
谢谢作者的回复!!!
- it is weird. Can you install mamba_ssm from source (not from pip or prebuilt wheels)?
- yes. If you want to change the scan path, you can simply change the
csms6s.py
.
好滴,谢谢作者的回复
关于第一个问题:我等会去试试。
关于第二个问题但是我想保持您写的代码运行速度也是只需要修改csms6s.py就可以了嘛?因为我在使用您写的代码的时候我选择v05版本,里面好像是使用CrossTriton和CrossMergeTrtion。
- Cuda version is the fastest version, but I am not good at writing cuda code: I can write it to be right, but I can not write it to be efficient. So I choose Triton, a package for parallel-programing beginners.
- V9 is actually the version combined with v05 and 'ln2d', which is corresponding to the config
classification/configs/vssm/vmambav2_tiny_224.yaml
作者,你好,
我想问一下在VSSBlock中的use_checkpoint是否不影响训练和测试?
作者,你好,我还有一个问题。
在SS2D中d_state的设置,我看配置文件中设置为了1。
想问一下作者,这个参数的设置需要考虑哪些因素呢?
好滴,我明白了。非常感谢作者的回复。