Yingyue-L / Mamba-LLaVA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mamba-LLaVA

Install

First follow the LLaVA README create the base environment.

Then install the packages for Mamba

pip install causal-conv1d
pip install mamba-ssm

Train

Pretrain (feature alignment)

Please download the 558K subset of the LAION-CC-SBU dataset with BLIP captions we use in the paper here.

Pretrain takes around 11 hours for Mamba-2.8B-LLaVA-v1.5 on 4x 3090 (24G).

Training script without DeepSpeed and bf16: pretrain_fp32.sh.

  • --mm_projector_type mlp2x_gelu: the two-layer MLP vision-language connector.
  • --vision_tower openai/clip-vit-large-patch14-336: CLIP ViT-L/14 336px.

Visual Instruction Tuning

coming soon ...

About

License:Apache License 2.0


Languages

Language:Python 88.4%Language:Shell 7.3%Language:JavaScript 2.0%Language:HTML 1.6%Language:CSS 0.4%Language:Dockerfile 0.3%