Training Code for LLM360 K2-65B

This repository contains the code for training K2-65B, a 65 billion parameter large language model from LLM360.

Note

This repository is under active development. If you have suggestions or find bugs, please open a GitHub issue or reach out.

To launch training, run:

bash scripts/pretrain_65b.sh

To convert model checkpoints from Megatron to HuggingFace format, run:

python convert_ckpt_to_hf.py --load_path <megatron_ckpt_dir> --save_path <huggingface_ckpt_dir>

About

Apache License 2.0

Language:Python 97.6%Language:C++ 1.6%Language:Shell 0.5%Language:C 0.2%Language:HTML 0.2%Language:Makefile 0.0%