Mini-Lightning

Introduction

Mini-Lightning is a lightweight machine learning training library, which is a mini version of Pytorch-Lightning with only 1k lines of code. It has the advantages of faster, more concise and more flexible.
Existing features: support for DDP(multi-node and multi-gpu), Sync-BN, DP, MP(model parallelism), AMP, gradient accumulation, warmup and lr_scheduler, grad clip, tensorboard, huggingface, peft, LLM, torchmetrics, model and result saving, beautiful console log, etc.
Only the minimal interfaces are exposed, keeping the features of simplicity, easy to read, use and extend.
examples can be found in examples/
If you have any problems or bug finding, please raise issue, Thank you.

Installation

Create a virtual environment and install Python (>= 3.8)
Download the latest version (>=1.12) of Torch(corresponding CUDA version) from the official website of PyTorch.
Install mini-lightning

# from pypi
pip install mini-lightning -U

# Or download the files from the repository to local,
# and go to the folder where setup.py is located, and run the following command
# (Recommended) You can enjoy the latest features and functions (including bug fixes)
pip install -e .  # -e: editable mode

Examples

First, you need to install the Mini-Lightning
Run the following examples

### test environment
python examples/test_env.py

### cv
pip install "torchvision>=0.13"
python examples/cv.py
# cv+dp (not recommended, please use DDP)
python examples/cv.py  # setting device_ids=[0, 1]

### nlp: bert gpt
pip install "transformers>=4.25" "datasets>=2.7" "peft>=0.3"
python examples/nlp_bert_mlm.py
python examples/nlp_bert_seq_cls.py
python examples/nlp_gpt_lm.py
python examples/nlp_gpt_seq_cls.py
# sft
python examples/nlp_gpt_zh_sft_adapter.py
python examples/nlp_gpt_zh_sft_lora.py
# llm (model parallelism)
#   Ref: https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary
python examples/nlp_baichuan_sft_lora.py
#   Ref: https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary
python examples/nlp_chatglm2_sft_lora.py

### dqn
pip install "gym>=0.26.2" "pygame>=2.1.2"
python examples/dqn.py

### gan
pip install "torchvision>=0.13"
python examples/gan.py

### contrastive learning
pip install "torchvision>=0.13" "scikit-learn>=1.2"
python examples/cl.py
# cl+ddp
torchrun --nproc_per_node 2 examples/cl_ddp.py --device 0,1

### gnn
# download torch_geometric
#   Ref: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
python examples/gnn_node.py
python examples/gnn_edge.py
python examples/gnn_graph.py

### ae
pip install "torchvision>=0.13" "scikit-learn>=1.2"
python examples/ae.py

### vae
pip install "torchvision>=0.13"
python examples/vae.py

### meta learning
pip install "torchvision>=0.13"
python examples/meta_learning.py


########## ddp
# torchrun (Recommended)
#   Ref: https://pytorch.org/docs/stable/elastic/run.html
# spawn
#   Ref: https://pytorch.org/docs/stable/notes/ddp.html
## single-node, multi-gpu
torchrun --nproc_per_node 2 examples/cv_ddp.py --device 0,1
python cv_ddp_spawn.py  # setting device_ids=[0, 1]

## multi-node
# default: --master_port 29500, or set master_port to prevents port conflicts.
torchrun --nnodes 2 --node_rank 0 --master_addr 127.0.0.1 --nproc_per_node 4 examples/cv_ddp.py --device 0,1,2,3
torchrun --nnodes 2 --node_rank 1 --master_addr xxx.xxx.xxx.xxx --nproc_per_node 4 examples/cv_ddp.py --device 0,1,2,3

TODO

Automatic parameter adjustment
Examples: Audio, Meta-learning, Diffusion, Auto-regressive, Reinforcement Learning
Support multi-gpu test
Output .log file

ustcml / mini-lightning

Mini-Lightning

Introduction

Installation

Examples

TODO

About

Languages