lzzmm

followers

following

stars

HKUST(Guangzhou)

Guangzhou

https://lzzmm.github.io

Organizations

sysu

CHEN Yuhan's starred repositories

llm_aided_ocr

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.

Language:Python118300

TensorRT-Incubator

Experimental projects related to TensorRT

Language:MLIR5400

chatbox

User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)

Language:TypeScriptGPL-3.02033400

flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Language:CudaApache-2.052600

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0792900

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonApache-2.01060300

gpu.cpp

A lightweight library for portable low-level GPU computation using WebGPU.

Language:C++Apache-2.0352700

turingas

Assembler for NVIDIA Volta and Turing GPUs

Language:PythonMIT19200

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonMIT63800

sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Language:PythonApache-2.014300

fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Language:CudaApache-2.016400

perf-ninja

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

Language:C++240600

BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Language:PythonCC-BY-4.010400

awesome-local-ai

An awesome repository of local AI tools

cortex

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM, ONNX). Powers 👋 Jan

Language:C++Apache-2.0184500

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonApache-2.02909200

cppinsights

C++ Insights - See your source code with the eyes of a compiler

Language:C++MIT400300

Pruner-Zero

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs

Language:PythonMIT6000

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2554000

compiler-and-arch

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookMIT1182300

vidur

A large-scale simulation framework for LLM inference

Language:PythonMIT19500

HEBO

Bayesian optimisation & Reinforcement Learning library developped by Huawei Noah's Ark Lab

Language:Jupyter Notebook315100

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02489000

llm-benchmark

Language:PythonMIT200

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT2266600

triton

Development repository for the Triton language and compiler

Language:C++MIT1225000

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonBSD-3-Clause825600

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Language:PythonApache-2.01151700

mlc-llm

Universal LLM Deployment Engine with ML Compilation

Language:PythonApache-2.01840400