Haibin Lin's repositories
pytorch-OpCounter
Count the MACs / FLOPs of your PyTorch model.
builder
Continuous builder and binary build scripts for pytorch
byteps
A high performance and general PS framework for distributed training
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
cutlass
CUDA Templates for Linear Algebra Subroutines
d2l-tvm
Dive into Deep Learning Compiler
DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
elpa
A scalable eigensolver for dense, symmetric (hermitian) matrices (fork of https://gitlab.mpcdf.mpg.de/elpa/elpa.git)
evals
Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.
gossip
gossip: Efficient Communication Primitives for Multi-GPU Systems
horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and MXNet.
HugeCTR
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
libfabric
Open Fabric Interfaces
lingvo
Lingvo
matxscript
The model pre- and post-processing framework
Megatron-LM
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Mini-Conf
Run a conference from your backyard.
tensorflow
An Open Source Machine Learning Framework for Everyone
trax
Trax — Deep Learning with Clear Code and Speed
ucx
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
ucx-py
Python bindings for UCX