databricks / megablocks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add a fine-tune script for JetMoE

shamanez opened this issue · comments

@tgale96

The JetMoE technical report has mentioned how they used Megablocks with Megatrone to train the model.

Then the author shared this fork of the megablokcs used during the training.

Could you please let us know how we can proceed with a fine-tuning script?

+1 to @shamanez request. It would be great to elaborate more on how to integrate "JetMoE" with Megatron both for pretraining and finetuning via Megablocks.

We'd love a community PR to upstream the changes from JetMoE!