Uranus (UranusSeven)

UranusSeven

Geek Repo

Company:Xprobe

Github PK Tool:Github PK Tool

Uranus's starred repositories

OpenDevin

🐚 OpenDevin: Code Less, Make More

Language:PythonLicense:MITStargazers:29216Issues:281Issues:1193

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:24949Issues:206Issues:211

presto

The official home of the Presto distributed SQL query engine for big data

Language:JavaLicense:Apache-2.0Stargazers:15805Issues:861Issues:6488

trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Language:JavaLicense:Apache-2.0Stargazers:9979Issues:173Issues:6472
Language:PythonLicense:Apache-2.0Stargazers:8960Issues:83Issues:1839

outlines

Structured Text Generation

Language:PythonLicense:Apache-2.0Stargazers:7409Issues:45Issues:515
Language:PythonLicense:Apache-2.0Stargazers:4598Issues:50Issues:846

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Language:PythonLicense:Apache-2.0Stargazers:1493Issues:39Issues:394

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonLicense:MITStargazers:598Issues:5Issues:34

ringattention

Transformers with Arbitrarily Large Context

Language:PythonLicense:Apache-2.0Stargazers:571Issues:5Issues:15

ring-flash-attention

Ring attention implementation with flash attention

ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Language:PythonLicense:MITStargazers:423Issues:10Issues:12

AI-Software-Startups

A Survey of AI startups

Consistency_LLM

[ICML 2024] CLLMs: Consistency Large Language Models

Language:PythonLicense:Apache-2.0Stargazers:324Issues:9Issues:10

msccl

Microsoft Collective Communication Library

Language:C++License:NOASSERTIONStargazers:283Issues:13Issues:27

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++License:Apache-2.0Stargazers:228Issues:8Issues:9

mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

Language:C++License:MITStargazers:192Issues:17Issues:84

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonLicense:MITStargazers:176Issues:3Issues:21

scattermoe

Triton-based implementation of Sparse Mixture of Experts.

Language:PythonLicense:Apache-2.0Stargazers:152Issues:5Issues:12

sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:112Issues:4Issues:6

flux

A fast communication-overlapping library for tensor parallelism on GPUs.

Language:C++License:Apache-2.0Stargazers:104Issues:7Issues:7

libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

Language:C++License:BSD-3-ClauseStargazers:79Issues:15Issues:5

SwiftTransformer

High performance Transformer implementation in C++.

Language:C++Stargazers:49Issues:1Issues:0

Odysseus-Transformer

Odysseus: Playground of LLM Sequence Parallelism