Robert Washbourne's starred repositories
dspy-rag-fastapi
FastAPI wrapper around DSPy
memory-compressed-attention
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
aphrodite-engine
PygmalionAI's large-scale inference engine
tensorrtllm_backend
The Triton TensorRT-LLM Backend
neural-cherche
Neural Search
llama2-burn
Llama2 LLM ported to Rust burn
token-hawk
WebGPU LLM inference tuned by hand
refunction
Reusing containers for faster serverless function execution - Masters Project @ Imperial College
serverless-dns
The RethinkDNS resolver that deploys to Cloudflare Workers, Deno Deploy, Fastly, and Fly.io
MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
recurrent-memory-transformer
[NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.
compressive-transformer-pytorch
Pytorch implementation of Compressive Transformers, from Deepmind
block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch