kevin-14

followers

following

stars

kevin__liu's repositories

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Apache-2.0000

CUDA-Learn-Note

🎉CUDA 笔记 / 高频面试题汇总 / C++笔记，个人笔记，更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

GPL-3.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache-2.0000

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Apache-2.0000

Awesome-LLM-Inference

💻A small Collection for Awesome LLM Inference [Papers|Blogs|Docs] with codes, contains TensorRT-LLM, streaming-llm, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0000

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Apache-2.0000

Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

MIT000

llama.cpp

Port of Facebook's LLaMA model in C/C++

MIT000

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

000

DeepLearningSystem

Deep Learning System core principles introduction.

Apache-2.0000

transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

Apache-2.0000

whisper.cpp

Port of OpenAI's Whisper model in C/C++

MIT000

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

NOASSERTION000

gloo

Collective communications library with various primitives for multi-machine training.

NOASSERTION000

DeepSpeedExamples

Example models using DeepSpeed

Apache-2.0000

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Apache-2.0000

ChatGPT

🔮 ChatGPT Desktop Application (Mac, Windows and Linux)

AGPL-3.0000

FasterTransformer

Transformer related optimization, including BERT, GPT

Apache-2.0000

flash-attention

Fast and memory-efficient exact attention

BSD-3-Clause000

server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

BSD-3-Clause000

tokenizers-cpp

Universal cross-platform tokenizers binding to HF and sentencepiece

Apache-2.0000

mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Apache-2.0000

pwndbg

Exploit Development and Reverse Engineering with GDB Made Easy

MIT000

glibc

Unofficial mirror of sourceware glibc repository. Updated daily.

NOASSERTION000

pdfs

Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)

000

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Apache-2.0000

the-algorithm

Source code for Twitter's Recommendation Algorithm

AGPL-3.0000

triton

Development repository for the Triton language and compiler

MIT000

airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Apache-2.0000

web-stable-diffusion

Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.

Apache-2.0000