Amanda-Barbara

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonApache-2.02611 29 254

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

MIT2479 19 54

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.01533 34 223

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.0718 13 60

LLaVA-NeXT

Language:Python696 17 39

CUDA-Learn-Notes

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaGPL-3.0658 8 5

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonApache-2.0550 23 388

EAGLE

[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Language:PythonApache-2.0545 12 69

Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

MIT452 15 3

MiniGPT4-video

Official code for MiniGPT4-video

Language:PythonBSD-3-Clause415 10 27

ring-flash-attention

Ring attention implementation with flash attention

Language:Python389 9 19

Consistency_LLM

[ICML 2024] CLLMs: Consistency Large Language Models

Language:PythonApache-2.0295 9 6

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Language:Python173 4 9

vllm_backend

Language:PythonBSD-3-Clause130 40

llm_long_context_bench202405

Language:PythonApache-2.02000

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.

MIT500