CHEN Yuhan (lzzmm)

lzzmm

Geek Repo

Company:HKUST(Guangzhou)

Location:Guangzhou

Home Page:https://lzzmm.github.io

Github PK Tool:Github PK Tool


Organizations
sysu

CHEN Yuhan's starred repositories

llm_aided_ocr

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.

Language:PythonStargazers:1183Issues:0Issues:0

TensorRT-Incubator

Experimental projects related to TensorRT

Language:MLIRStargazers:54Issues:0Issues:0

chatbox

User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)

Language:TypeScriptLicense:GPL-3.0Stargazers:20334Issues:0Issues:0

flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Language:CudaLicense:Apache-2.0Stargazers:526Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:7929Issues:0Issues:0

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:10603Issues:0Issues:0

gpu.cpp

A lightweight library for portable low-level GPU computation using WebGPU.

Language:C++License:Apache-2.0Stargazers:3527Issues:0Issues:0

turingas

Assembler for NVIDIA Volta and Turing GPUs

Language:PythonLicense:MITStargazers:192Issues:0Issues:0

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonLicense:MITStargazers:638Issues:0Issues:0

sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:143Issues:0Issues:0

fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Language:CudaLicense:Apache-2.0Stargazers:164Issues:0Issues:0

perf-ninja

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

Language:C++Stargazers:2406Issues:0Issues:0

BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Language:PythonLicense:CC-BY-4.0Stargazers:104Issues:0Issues:0

awesome-local-ai

An awesome repository of local AI tools

Stargazers:1102Issues:0Issues:0

cortex

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM, ONNX). Powers đź‘‹ Jan

Language:C++License:Apache-2.0Stargazers:1845Issues:0Issues:0

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonLicense:Apache-2.0Stargazers:29092Issues:0Issues:0

cppinsights

C++ Insights - See your source code with the eyes of a compiler

Language:C++License:MITStargazers:4003Issues:0Issues:0

Pruner-Zero

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs

Language:PythonLicense:MITStargazers:60Issues:0Issues:0

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:25540Issues:0Issues:0

compiler-and-arch

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

Stargazers:356Issues:0Issues:0

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookLicense:MITStargazers:11823Issues:0Issues:0

vidur

A large-scale simulation framework for LLM inference

Language:PythonLicense:MITStargazers:195Issues:0Issues:0

HEBO

Bayesian optimisation & Reinforcement Learning library developped by Huawei Noah's Ark Lab

Language:Jupyter NotebookStargazers:3151Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:24890Issues:0Issues:0
Language:PythonLicense:MITStargazers:2Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:22666Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:12250Issues:0Issues:0

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonLicense:BSD-3-ClauseStargazers:8256Issues:0Issues:0

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Language:PythonLicense:Apache-2.0Stargazers:11517Issues:0Issues:0

mlc-llm

Universal LLM Deployment Engine with ML Compilation

Language:PythonLicense:Apache-2.0Stargazers:18404Issues:0Issues:0