kevin__liu's repositories
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
CUDA-Learn-Note
🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Awesome-LLM-Inference
💻A small Collection for Awesome LLM Inference [Papers|Blogs|Docs] with codes, contains TensorRT-LLM, streaming-llm, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
llama.cpp
Port of Facebook's LLaMA model in C/C++
LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
DeepLearningSystem
Deep Learning System core principles introduction.
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
whisper.cpp
Port of OpenAI's Whisper model in C/C++
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
gloo
Collective communications library with various primitives for multi-machine training.
DeepSpeedExamples
Example models using DeepSpeed
peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
ChatGPT
🔮 ChatGPT Desktop Application (Mac, Windows and Linux)
FasterTransformer
Transformer related optimization, including BERT, GPT
flash-attention
Fast and memory-efficient exact attention
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
pwndbg
Exploit Development and Reverse Engineering with GDB Made Easy
glibc
Unofficial mirror of sourceware glibc repository. Updated daily.
pdfs
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
the-algorithm
Source code for Twitter's Recommendation Algorithm
triton
Development repository for the Triton language and compiler
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
web-stable-diffusion
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.