LiuXinyu (lauthu)

lauthu

Geek Repo

Company:快手

Location:Beijing, China.

Github PK Tool:Github PK Tool

LiuXinyu's starred repositories

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:27910Issues:229Issues:4705

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:19605Issues:158Issues:1497

datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Language:PythonLicense:Apache-2.0Stargazers:19102Issues:280Issues:2908

nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonLicense:MITStargazers:10443Issues:68Issues:105

magic-animate

[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Language:PythonLicense:BSD-3-ClauseStargazers:10401Issues:104Issues:146

micrograd

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Language:Jupyter NotebookLicense:MITStargazers:10139Issues:149Issues:30

mistral-src

Reference implementation of Mistral AI 7B v0.1 model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:8772Issues:116Issues:115

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:8342Issues:90Issues:1834

easy_rust

Rust explained using easy English

Language:ShellLicense:MITStargazers:8072Issues:149Issues:44

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++License:MITStargazers:7903Issues:77Issues:161

skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Language:PythonLicense:Apache-2.0Stargazers:6631Issues:70Issues:1745

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonLicense:Apache-2.0Stargazers:4483Issues:47Issues:193

distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4344Issues:132Issues:350

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Language:PythonLicense:MITStargazers:3539Issues:33Issues:441

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Language:PythonLicense:Apache-2.0Stargazers:2465Issues:23Issues:180

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2234Issues:33Issues:87

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language:PythonLicense:Apache-2.0Stargazers:2180Issues:34Issues:201

gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Language:PythonLicense:Apache-2.0Stargazers:1893Issues:29Issues:48

Olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

Language:PythonLicense:MITStargazers:1538Issues:30Issues:185

pytest-benchmark

py.test fixture for benchmarking code

Language:PythonLicense:BSD-2-ClauseStargazers:1237Issues:20Issues:188

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Language:PythonLicense:MITStargazers:1201Issues:21Issues:87

onnxruntime-inference-examples

Examples for using ONNX Runtime for machine learning inferencing.

Language:C++License:MITStargazers:1162Issues:38Issues:156

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonLicense:Apache-2.0Stargazers:669Issues:23Issues:470

database-system-readings

:yum: A curated reading list about database systems

LLM-QAT

Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

Language:PythonLicense:NOASSERTIONStargazers:242Issues:5Issues:30

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:131Issues:10Issues:36

Programming_Massively_Parallel_Processors

CUDA 6大并行计算模式 代码与笔记

Language:CudaStargazers:58Issues:2Issues:0