xyn1201/libai

LiBai

Introduction

English | 简体中文

LiBai is a large-scale open-source model training toolbox based on OneFlow. The main branch works with OneFlow 0.7.0.

Highlights

Support a collection of parallel training components

LiBai provides multiple parallelisms such as Data Parallelism, Tensor Parallelism, and Pipeline Parallelism. It's also extensible for other new parallelisms.
Varied training techniques

LiBai provides many out-of-the-box training techniques such as Distributed Training, Mixed Precision Training, Activation Checkpointing, Recomputation, Gradient Accumulation, and Zero Redundancy Optimizer(ZeRO).
Support for both CV and NLP tasks

LiBai has predifined data process for both CV and NLP datasets such as CIFAR, ImageNet, and BERT Dataset.
Easy to use

LiBai's components are designed to be modular for easier usage as follows:
- LazyConfig system for more flexible syntax and no predefined structures
- Friendly trainer and engine
- Used as a library to support building research projects on it. See projects/ for some projects that are built based on LiBai
High Efficiency

Installation

See Installation instructions.

Getting Started

See Quick Run for the basic usage of LiBai.

Documentation

See LiBai's documentation for full API documentation and tutorials.

ChangeLog

Beta 0.1.0 was released in 22/03/2022, the main features and supported models in 0.1.0 version are as follows:

Features:

Support Data Parallelism
Support 1D Tensor Parallelism
Support Pipeline Parallelism
Unified distributed Layers for both single-GPU and multi-GPU training
LazyConfig system for more flexible syntax and no predefined structures
Easy-to-use trainer and engine
Support both CV and NLP data processing
Mixed Precision Training
Activation Checkpointing
Gradient Accumulation
Gradient Clipping
Zero Redundancy Optimizer (ZeRO)

Supported Models:

Support 3D parallel BERT model
Support 3D parallel GPT-2 model
Support 3D parallel T5 model
Support 3D parallel Vision Transformer model
Support Data parallel Swin Transformer model
Support finetune task in QQP project
Support text classification task in text classification project
Support Pathways Language Model (PaLM) in PaLM project
Support MoCo_v3 in MOCOV3 project
(experimental) Support MAE in MAE project

See changelog for details and release history.

Contributing

We appreciate all contributions to improve LiBai. See CONTRIBUTING for the contributing guideline.

License

This project is released under the Apache 2.0 license.

Citation

If you find this project useful for your research, consider cite:

@misc{of2021libai,
  author =       {Xingyu Liao and Peng Cheng and Tianhe Ren and Depeng Liang and
                  Kai Dang and Yi Wang and Xiaoyu Xu},
  title =        {LiBai},
  howpublished = {\url{https://github.com/Oneflow-Inc/libai}},
  year =         {2021}
}

About

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

https://libai.readthedocs.io

Apache License 2.0

Languages

Language:Python 97.4%Language:C++ 2.1%Language:Shell 0.3%Language:Dockerfile 0.1%Language:Makefile 0.0%