Zhen Zhang's repositories
dt-autorun
autorun distributed training experiments and gathering logs
alpa
Training and serving large-scale neural networks
flash-attention
Fast and memory-efficient exact attention
grace
GRACE - GRAdient ComprEssion for distributed deep learning
kickstart.nvim
A launch point for your personal nvim configuration
Megatron-LM
Ongoing research training transformer language models at scale, including: BERT & GPT-2
model-prepare
generate models to serve
nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
nccl-tests
NCCL Tests
open_clip
An open source implementation of CLIP.
ratex
Yuan's fork of Ratex
slapo
A schedule language for progressive optimization of large deep learning model training
split-annotations
Source code for the split annotations project.
UGATIT-pytorch
Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation