There are 0 repository under llm-infra topic.
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
A lightweight Bun + Express template that connects to the Testune AI API and streams chat responses in real time using Server-Sent Events (SSE)