Beast code in Giters

Eric Auld's repositories

Language:Cuda300

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause100

CUDA Core Compute Libraries

Language:C++NOASSERTION000

000

Language:C++000

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION000

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.0000

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Language:CudaApache-2.0000

Language:C++Apache-2.0000

QMK, forked for ZSA's Oryx Configurator (to safeguard stability)

Language:CGPL-2.0000

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Language:Python000

000

Language:PythonGPL-3.0000

CUDA related news and material links

MIT000

scalable and robust tree-based speculative decoding algorithm

Language:Python000

000

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache-2.0000

000