Eric Auld (ericauld)

ericauld

Geek Repo

Location:LA & SF

Home Page:ericauld.github.io

Twitter:@aulderic

Github PK Tool:Github PK Tool

Eric Auld's repositories

Language:CudaStargazers:3Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:1Issues:0Issues:0

cccl

CUDA Core Compute Libraries

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Language:C++Stargazers:0Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

qmk_firmware

QMK, forked for ZSA's Oryx Configurator (to safeguard stability)

Language:CLicense:GPL-2.0Stargazers:0Issues:0Issues:0

QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Language:PythonStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0

resource-stream

CUDA related news and material links

License:MITStargazers:0Issues:0Issues:0

Sequoia

scalable and robust tree-based speculative decoding algorithm

Language:PythonStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0