Beast code in Giters

Jinze Xue's starred repositories

microxcaling

PyTorch emulation library for Microscaling (MX)-compatible data formats

Language:PythonMIT13000

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language:PythonNOASSERTION35700

FP8

CC0-1.0600

lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.

Language:PythonApache-2.0109900

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT1930100

tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Language:PythonApache-2.01520800

BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Language:PythonMIT148200

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonBSD-3-Clause541400

Megatron-LLaMA

Best practice for training LLaMA models in Megatron-LM

Language:PythonNOASSERTION57900

MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

Language:C++BSD-3-Clause116500

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonApache-2.0520500

gemma

Open weights LLM from Google DeepMind.

Language:PythonApache-2.0230000

mlx

MLX: An array framework for Apple silicon

Language:C++MIT1604600

ieee754

Python module which finds the IEEE-754 representation of a floating point number.

Language:PythonMIT2600

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonMIT48800

float8_experimental

This repository contains the experimental PyTorch native float8 training UX

Language:PythonBSD-3-Clause20100

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02411400

GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Language:PythonApache-2.0765100

DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

Language:C++Apache-2.036200

cccl

CUDA Core Compute Libraries

Language:C++NOASSERTION103500

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonApache-2.062200

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0778100

jinzex

Jinze Xue's starred repositories

microxcaling

TensorRT-Model-Optimizer

FP8

lightning-thunder

unilm

tensor2tensor

BitNet

gpt-fast

Megatron-LLaMA

MatX

gemma_pytorch

gemma

mlx

ieee754

MS-AMP

float8_experimental

vllm

GLM-130B

DCGM

cccl

tensorrtllm_backend

TensorRT-LLM

llama

WeightWatcher

float16-simulator.js

accelerate

flash-attention

TransformerEngine

NeMo

Megatron-LM