tridao/zoo

We use the template from https://github.com/ashleve/lightning-hydra-template. Please read the instructions there to understand the repo structure.

GPT2 training

To train GPT2 on Openwebtext with 8 GPUs:

python run.py experiment=owt/gpt2s-flash trainer.devices=8
python run.py experiment=owt/gpt2m-flash trainer.devices=8
python run.py experiment=owt/gpt2l-flash trainer.devices=8

To train with bf16 instead of fp16, add trainer.precision=bf16.

Requirements

Python 3.8+, Pytorch 1.9+, torchvision, torchtext, pytorch-fast-transformers, munch, einops, timm, hydra-core, hydra-colorlog, python-dotenv, rich, pytorch-lightning, triton. We recommend CUDA 11.8 (e.g., using the Nvidia's Pytorch Docker image from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)

We provide a Dockerfile that lists all the required packages.

This repo includes the following CUDA extensions:

Fused dropout + residual + LayerNorm, adapted from Apex's FastLayerNorm.

cd csrc/layer_norm && pip install .

Fused matmul + bias (forward and backward), and fused matmul + bias + gelu (forward and backward), adapted from Apex's FusedDense.

cd csrc/fused_dense_lib && pip install .

Optimized cross-entropy loss, adapted from Apex's Xentropy.

cd csrc/xentropy && pip install .

About

Apache License 2.0

Languages

Language:Python 62.8%Language:Cuda 26.7%Language:C++ 8.1%Language:Shell 1.0%Language:Dockerfile 1.0%Language:C 0.5%