H (on leave)'s repositories
pytorch-OpCounter
Count the MACs / FLOPs of your PyTorch model.
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
builder
Continuous builder and binary build scripts for pytorch
byteps
A high performance and general PS framework for distributed training
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
cutlass
CUDA Templates for Linear Algebra Subroutines
DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
deepscaler
Democratizing Reinforcement Learning for LLMs
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
DeepSpeedExamples
Example models using DeepSpeed
elpa
A scalable eigensolver for dense, symmetric (hermitian) matrices (fork of https://gitlab.mpcdf.mpg.de/elpa/elpa.git)
evals
Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.
flash-attention
Fast and memory-efficient exact attention
gossip
gossip: Efficient Communication Primitives for Multi-GPU Systems
HugeCTR
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
lingvo
Lingvo
matxscript
The model pre- and post-processing framework
Megatron-LM
Ongoing research training transformer language models at scale, including: BERT & GPT-2
openr
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
slapo
A schedule language for large model training
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
ucx
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
ucx-py
Python bindings for UCX