huu4ontocord / M3rlin-fmengine

M3 Training Using FMengine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FMEngine

FMEngine is a utility library for training very large foundation models. The goal of fmengine is to provide a

  • Ergonomic interface for training foundation models. It is sufficient easy for a beginner to use, but also provides enough flexibility for advanced users to customize their training.
  • Efficient optimizations built in. FMEngine is equipped with Flash Attention and various fused ops to accelerate training.
  • HPC-friendly installation with pre-built docker and singularity/apptainer containers. FMEngine is mainly designed and tested on Slurm clusters. We provide starter scripts for running FMEngine on Slurm clusters.
  • Compatible with existing frameworks and tools, particularly with HuggingFace. Since FMEngine is built with DeepSpeed, it is also compatible with all DeepSpeed features.

For now, FMEngine supports two families of models: GPT-NeoX and LLama.

Model #Params #Layers #Heads #Dim Pretrained Checkpoint Flash Attention
Pythia-160M 85M 12 12 768 Download Yes
Pythia-1.4B 1.2B 24 16 2048 Download Yes
Pythia-2.8B 2.5B 32 32 2560 Download Yes
OpenLlama-3B tba tba tba tba Download Yes
Llama-2-70b tba tba tba tba tba Yes

Acknowledgement

FMEngine is primarily implemented and maintained by the Efficient Architecture and Systems Labs @ ETH Zurich.

About

M3 Training Using FMengine

License:Apache License 2.0


Languages

Language:Python 90.7%Language:Jupyter Notebook 5.7%Language:Shell 2.9%Language:Dockerfile 0.7%Language:Makefile 0.0%