lliquid

Panpan XU's starred repositories

LLMs-from-scratch

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Language:Jupyter NotebookNOASSERTION25114 278 77

tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Language:PythonMIT11595 168 229

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.08007 87 1740

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonMIT6232 35 1021

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION5905 47 78

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonBSD-3-Clause5473 64 97

sglang

SGLang is a fast serving framework for large language models and vision language models.

Language:PythonApache-2.04587 45 433

tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

Language:C++NOASSERTION3628 50 381

mamba-minimal

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

Language:PythonApache-2.02513 23 26

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookApache-2.02155 32 85

Awesome-Text2SQL

Curated tutorials and resources for Large Language Models, Text2SQL, Text2DSL、Text2API、Text2Vis and more.

MIT1543 17 5

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Language:PythonMIT1160 21 86

streaming

A Data Streaming Library for Efficient Neural Network Training

Language:PythonApache-2.01062 20 160

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.01031 15 91

blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Language:CudaMIT1020 199 48

extension-cpp

C++ extensions in PyTorch

Language:Python979 35 74

dolma

Data and tools for generating and inspecting OLMo pre-training data.

Language:PythonApache-2.0889 18 67

safari

Convolutions for Sequence Modeling

Language:AssemblyApache-2.0858 35 38

llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

Language:PythonApache-2.0830 8 18

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.0528 15 26

megalodon

Reference implementation of Megalodon 7B model

Language:CudaMIT500 14 7

NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

Language:CApache-2.0274 12 35