llx (llx-08)

llx-08

Geek Repo

Location:Beijing, China

Github PK Tool:Github PK Tool

llx's starred repositories

splitwise-sim

LLM serving cluster simulator

Language:Jupyter NotebookLicense:MITStargazers:36Issues:0Issues:0

vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

Language:CLicense:MITStargazers:131Issues:0Issues:0

InfiniGen

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)

Language:PythonLicense:Apache-2.0Stargazers:22Issues:0Issues:0

FlashAttention-PyTorch

Implementation of FlashAttention in PyTorch

Language:PythonLicense:MITStargazers:93Issues:0Issues:0

vidur

A large-scale simulation framework for LLM inference

Language:PythonLicense:MITStargazers:164Issues:0Issues:0

KuiperLLama

动手实现大模型推理框架

Language:C++Stargazers:96Issues:0Issues:0

ServerlessLLM

Cost-efficient and fast multi-LLM serving.

Language:PythonStargazers:118Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:327Issues:0Issues:0
Language:MATLABLicense:GPL-3.0Stargazers:10277Issues:0Issues:0

Triton-Puzzles

Puzzles for learning Triton

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:891Issues:0Issues:0

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Language:CudaStargazers:1283Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:48Issues:0Issues:0

CUDA-Learn-Notes

🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaLicense:GPL-3.0Stargazers:934Issues:0Issues:0

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonLicense:MITStargazers:598Issues:0Issues:0

ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Language:C++License:NOASSERTIONStargazers:19862Issues:0Issues:0

googletest

GoogleTest - Google Testing and Mocking Framework

Language:C++License:BSD-3-ClauseStargazers:33842Issues:0Issues:0

ring-flash-attention

Ring attention implementation with flash attention

Language:PythonStargazers:458Issues:0Issues:0

freevpn

免费公益机场节点分享

Stargazers:510Issues:0Issues:0

ebooks

收藏的一些经典的历史、政治、心理、哲学、数学、计算机方面电子书(约10万本)

Language:JavaScriptStargazers:3216Issues:0Issues:0

AI-Software-Startups

A Survey of AI startups

License:MITStargazers:391Issues:0Issues:0

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

Stargazers:931Issues:0Issues:0

ParrotServe

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Language:PythonLicense:MITStargazers:72Issues:0Issues:0

CUDA_gemm

A simple high performance CUDA GEMM implementation.

Language:CudaStargazers:304Issues:0Issues:0

lectures

Material for cuda-mode lectures

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2001Issues:0Issues:0

AI-System

System for AI Education Resource.

Language:PythonLicense:CC-BY-4.0Stargazers:3246Issues:0Issues:0

aiohttp

Asynchronous HTTP client/server framework for asyncio and Python

Language:PythonLicense:NOASSERTIONStargazers:14840Issues:0Issues:0

Awesome-RoadMaps-and-Interviews

Awesome Interviews for Coder, Programming Language, Software Engineering, Web, Backend, Distributed Infrastructure, DataScience & AI | 面试必备

Language:HTMLLicense:NOASSERTIONStargazers:128Issues:0Issues:0

cs344

Introduction to Parallel Programming class code

Language:CudaStargazers:1289Issues:0Issues:0

llumnix

Efficient and easy multi-instance LLM serving

Language:PythonLicense:Apache-2.0Stargazers:74Issues:0Issues:0

Nsight-Compute-Docker-Image

Nsight Compute in Docker

Language:DockerfileLicense:MITStargazers:10Issues:0Issues:0