Yanqi Zhang's starred repositories
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
StreamDiffusion
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
llama-cpp-python
Python bindings for llama.cpp
lm-evaluation-harness
A framework for few-shot evaluation of language models.
FasterTransformer
Transformer related optimization, including BERT, GPT
Awesome-Incremental-Learning
Awesome Incremental Learning
flashinfer
FlashInfer: Kernel Library for LLM Serving
torchgpipe
A GPipe implementation in PyTorch
aws-neuron-sdk
Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
k8s-dra-driver
Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes