menelaus's repositories
bigbird
Transformers for Longer Sequences
CUDALibrarySamples
CUDA Library Samples
DeepLearningExamples
Deep Learning Examples
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
google-research
Google Research
iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Megatron-LM
Ongoing research training transformer models at scale
openshmem-examples
Some miscellaneous OpenSHMEM examples
paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading model flop utilization rates.
SHARK-dev
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters
tensorflow
An Open Source Machine Learning Framework for Everyone
training
Reference implementations of MLPerf™ training benchmarks
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
triton
Development repository for the Triton language and compiler
TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
xla
A machine learning compiler for GPUs, CPUs, and ML accelerators