LiuXinyu (lauthu)

lauthu

Geek Repo

Company:快手

Location:Beijing, China.

Github PK Tool:Github PK Tool

LiuXinyu's starred repositories

TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language:PythonLicense:NOASSERTIONStargazers:536Issues:0Issues:0

QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Language:PythonStargazers:75Issues:0Issues:0

tensorrt_backend

The Triton backend for TensorRT.

Language:C++License:BSD-3-ClauseStargazers:62Issues:0Issues:0

pytorch_backend

The Triton backend for the PyTorch TorchScript models.

Language:C++License:BSD-3-ClauseStargazers:123Issues:0Issues:0

blog_os

Writing an OS in Rust

Language:HTMLLicense:Apache-2.0Stargazers:15782Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:24354Issues:0Issues:0

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

License:MITStargazers:3575Issues:0Issues:0

unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:17934Issues:0Issues:0

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1955Issues:0Issues:0

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:12064Issues:0Issues:0

flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Language:PythonLicense:MITStargazers:1324Issues:0Issues:0

easy_rust

Rust explained using easy English

Language:ShellLicense:MITStargazers:8097Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:138Issues:0Issues:0

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonLicense:Apache-2.0Stargazers:4611Issues:0Issues:0

Programming_Massively_Parallel_Processors

CUDA 6大并行计算模式 代码与笔记

Language:CudaStargazers:58Issues:0Issues:0

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonLicense:Apache-2.0Stargazers:703Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:29846Issues:0Issues:0

micrograd

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Language:Jupyter NotebookLicense:MITStargazers:10449Issues:0Issues:0

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonLicense:MITStargazers:10687Issues:0Issues:0

pytest-benchmark

pytest fixture for benchmarking code

Language:PythonLicense:BSD-2-ClauseStargazers:1252Issues:0Issues:0

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++License:MITStargazers:7954Issues:0Issues:0

LLM-QAT

Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

Language:PythonLicense:NOASSERTIONStargazers:253Issues:0Issues:0

mistral-inference

Official inference library for Mistral models

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9707Issues:0Issues:0

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2297Issues:0Issues:0

skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Language:PythonLicense:Apache-2.0Stargazers:6757Issues:0Issues:0

database-system-readings

:yum: A curated reading list about database systems

License:MITStargazers:466Issues:0Issues:0
Language:PythonStargazers:179Issues:0Issues:0

magic-animate

[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Language:PythonLicense:BSD-3-ClauseStargazers:10470Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:20147Issues:0Issues:0

gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Language:PythonLicense:Apache-2.0Stargazers:1923Issues:0Issues:0