Jinze Xue (jinzex)

jinzex

Geek Repo

Company:@NVIDIA

Github PK Tool:Github PK Tool

Jinze Xue's starred repositories

microxcaling

PyTorch emulation library for Microscaling (MX)-compatible data formats

Language:PythonLicense:MITStargazers:130Issues:0Issues:0

TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language:PythonLicense:NOASSERTIONStargazers:357Issues:0Issues:0
License:CC0-1.0Stargazers:6Issues:0Issues:0

lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.

Language:PythonLicense:Apache-2.0Stargazers:1099Issues:0Issues:0

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:19301Issues:0Issues:0

tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Language:PythonLicense:Apache-2.0Stargazers:15208Issues:0Issues:0

BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Language:PythonLicense:MITStargazers:1482Issues:0Issues:0

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5414Issues:0Issues:0

Megatron-LLaMA

Best practice for training LLaMA models in Megatron-LM

Language:PythonLicense:NOASSERTIONStargazers:579Issues:0Issues:0

MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

Language:C++License:BSD-3-ClauseStargazers:1165Issues:0Issues:0

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonLicense:Apache-2.0Stargazers:5205Issues:0Issues:0

gemma

Open weights LLM from Google DeepMind.

Language:PythonLicense:Apache-2.0Stargazers:2300Issues:0Issues:0

mlx

MLX: An array framework for Apple silicon

Language:C++License:MITStargazers:16046Issues:0Issues:0

ieee754

Python module which finds the IEEE-754 representation of a floating point number.

Language:PythonLicense:MITStargazers:26Issues:0Issues:0

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonLicense:MITStargazers:488Issues:0Issues:0

float8_experimental

This repository contains the experimental PyTorch native float8 training UX

Language:PythonLicense:BSD-3-ClauseStargazers:201Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:24114Issues:0Issues:0

GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Language:PythonLicense:Apache-2.0Stargazers:7651Issues:0Issues:0

DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

Language:C++License:Apache-2.0Stargazers:362Issues:0Issues:0

cccl

CUDA Core Compute Libraries

Language:C++License:NOASSERTIONStargazers:1035Issues:0Issues:0

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonLicense:Apache-2.0Stargazers:622Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:7781Issues:0Issues:0

llama

Inference code for Llama models

Language:PythonLicense:NOASSERTIONStargazers:54833Issues:0Issues:0

WeightWatcher

The WeightWatcher tool for predicting the accuracy of Deep Neural Networks

Language:PythonLicense:Apache-2.0Stargazers:1420Issues:0Issues:0

float16-simulator.js

A simulator for low-precision floating point calculations running in the browser

Language:JavaScriptLicense:MITStargazers:12Issues:0Issues:0

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonLicense:Apache-2.0Stargazers:7462Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:12724Issues:0Issues:0

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1699Issues:0Issues:0

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:11149Issues:0Issues:0

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:9573Issues:0Issues:0