How to utilize the repo to replicate MoE-Mamba

Question

How to utilize the repo to replicate MoE-Mamba

chazzmoney opened this issue 6 months ago · comments

Hello,

Thank you for providing the repo. I am actually a fan of all three of the associated papers, so I really appreciate finding it this time. I've implemented MoT myself in two projects with positive results.

I'd like to replicate some of the work you've done in your most recent paper, MoE-Mamba. However, I'm struggling to figure out how you utilized this repo to perform the experiments as described in the paper. Any information would be useful.

Thank you,

Charles

Eupham · Answer 1 · Sun Jan 21 2024 02:56:17 GMT+0800 (China Standard Time)

I'm fumbling around here as well. I just came from using mamba-chat's implementation of mamba training and am trying to figure out how to integrate this in my workflow.

FloMru · Answer 2 · Tue Jan 23 2024 17:05:43 GMT+0800 (China Standard Time)

I am also interested in this.

Eupham · Answer 3 · Sat Jan 27 2024 01:57:40 GMT+0800 (China Standard Time)

This is a fun repo to watch. Looking at https://github.com/llm-random/llm-random/pull/340/commits it still looks like they're working on trying out multiple implementations using MoE with mamba. I'm guessing this is a WIP and giving a direct implementation could possibly take a wait or diy. I'm just looking for the entry point into the original mamba model from state spaces.

Yikun Liu · Answer 4 · Mon Feb 26 2024 17:42:48 GMT+0800 (China Standard Time)

Is there any solution to replicate the Moe-mamba now? 🤔