wesley7137 / mamba-finetune

finetune mamba using deep speed

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repo is used for finetune mamba using deepspeed

mamba original paper https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf

Single GPU Use

python finetune_mamba.py --output_dir path/to/your/dir --model_name_or_path path/to/your/model

Multiple GPU single node Use through deepspeed

deepspeed finetune_mamba.py --output_dir path/to/your/dir --model_name_or_path path/to/your/model --deepspeed path/to/your/deepspeed_config.json

Multiple GPU Multiple node Use through deepspeed

torchrun --nproc_per_node=2 --nnode=4 finetune_mamba.py --output_dir path/to/your/dir --model_name_or_path path/to/your/model --deepspeed path/to/your/deepspeed_config.json

About

finetune mamba using deep speed

License:Apache License 2.0


Languages

Language:Python 100.0%