llm-random / llm-random

Home Page:https://llm-random.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to utilize the repo to replicate MoE-Mamba

chazzmoney opened this issue · comments

Hello,

Thank you for providing the repo. I am actually a fan of all three of the associated papers, so I really appreciate finding it this time. I've implemented MoT myself in two projects with positive results.

I'd like to replicate some of the work you've done in your most recent paper, MoE-Mamba. However, I'm struggling to figure out how you utilized this repo to perform the experiments as described in the paper. Any information would be useful.

Thank you,

Charles

I'm fumbling around here as well. I just came from using mamba-chat's implementation of mamba training and am trying to figure out how to integrate this in my workflow.

I am also interested in this.

This is a fun repo to watch. Looking at https://github.com/llm-random/llm-random/pull/340/commits it still looks like they're working on trying out multiple implementations using MoE with mamba. I'm guessing this is a WIP and giving a direct implementation could possibly take a wait or diy. I'm just looking for the entry point into the original mamba model from state spaces.

Is there any solution to replicate the Moe-mamba now? 🤔