Jinze Xue's starred repositories
microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
tensor2tensor
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
gemma_pytorch
The official PyTorch implementation of Google's Gemma models
float8_experimental
This repository contains the experimental PyTorch native float8 training UX
tensorrtllm_backend
The Triton TensorRT-LLM Backend
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
WeightWatcher
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks
float16-simulator.js
A simulator for low-precision floating point calculations running in the browser
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
flash-attention
Fast and memory-efficient exact attention
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Megatron-LM
Ongoing research training transformer models at scale