llm-infra

There are 0 repository under llm-infra topic.

/ SageAttention
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
attention inference-acceleration llm quantization cuda triton video-generation efficient-attention mlsys llm-infra vit video-generate
Language:Cuda 2393
/ express-template
A lightweight Bun + Express template that connects to the Testune AI API and streams chat responses in real time using Server-Sent Events (SSE)
ai infrastructure llm llm-infra llm-infrastructure
Language:TypeScript 1

/ SageAttention