UranusSeven

Uranus's starred repositories

OpenDevin

🐚 OpenDevin: Code Less, Make More

Language:PythonMIT30189 281 1252

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION26541 219 250

mem0

The Memory layer for your AI apps

Language:PythonApache-2.022213 127 657

outlines

Structured Text Generation

Language:PythonApache-2.08187 47 553

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.02608 85 6

lectures

Material for cuda-mode lectures

Language:Jupyter NotebookApache-2.02487 35 7

onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Language:Jupyter NotebookApache-2.01618 40 433

ThunderKittens

Tile primitives for speedy kernels

Language:CudaMIT1525 25 26

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1072 12 4

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonMIT726 6 53

ringattention

Transformers with Arbitrarily Large Context

Language:PythonApache-2.0622 6 16

ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Language:PythonMIT461 11 14

AI-Software-Startups

A Survey of AI startups

MIT392 29 5

ttt-lm-jax

Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Language:Python350 9 11

msccl

Microsoft Collective Communication Library

Language:C++NOASSERTION309 13 27

LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Language:PythonMIT285 2 9