Robert Washbourne's starred repositories

nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:NOASSERTIONStargazers:222Issues:0Issues:0

litgpt

Load, pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

Language:PythonLicense:Apache-2.0Stargazers:8049Issues:0Issues:0

lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Language:PythonLicense:Apache-2.0Stargazers:5876Issues:0Issues:0

ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines

Language:PythonLicense:Apache-2.0Stargazers:5514Issues:0Issues:0

REMEDI

Inspecting and Editing Knowledge Representations in Language Models

Language:PythonLicense:MITStargazers:102Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:8211Issues:0Issues:0

litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

Language:PythonLicense:NOASSERTIONStargazers:10016Issues:0Issues:0

dspy-rag-fastapi

FastAPI wrapper around DSPy

Language:PythonLicense:MITStargazers:159Issues:0Issues:0

memory-compressed-attention

Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"

Language:PythonLicense:MITStargazers:72Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:831Issues:0Issues:0

snorkel

A system for quickly generating training data with weak supervision

Language:PythonLicense:Apache-2.0Stargazers:5740Issues:0Issues:0

aphrodite-engine

PygmalionAI's large-scale inference engine

Language:PythonLicense:AGPL-3.0Stargazers:738Issues:0Issues:0

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonLicense:Apache-2.0Stargazers:575Issues:0Issues:0

raptor

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Language:PythonLicense:MITStargazers:653Issues:0Issues:0

neural-cherche

Neural Search

Language:PythonLicense:MITStargazers:318Issues:0Issues:0

llama2-burn

Llama2 LLM ported to Rust burn

Language:RustLicense:MITStargazers:263Issues:0Issues:0

burn

Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.

Language:RustLicense:Apache-2.0Stargazers:7624Issues:0Issues:0

token-hawk

WebGPU LLM inference tuned by hand

Language:C++License:MITStargazers:143Issues:0Issues:0

refunction

Reusing containers for faster serverless function execution - Masters Project @ Imperial College

Language:GoLicense:AGPL-3.0Stargazers:20Issues:0Issues:0

serverless-dns

The RethinkDNS resolver that deploys to Cloudflare Workers, Deno Deploy, Fastly, and Fly.io

Language:JavaScriptLicense:MPL-2.0Stargazers:1784Issues:0Issues:0

workerd

The JavaScript / Wasm runtime that powers Cloudflare Workers

Language:C++License:Apache-2.0Stargazers:5854Issues:0Issues:0

wonnx

A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web

Language:RustLicense:NOASSERTIONStargazers:1541Issues:0Issues:0

LongMamba

Some preliminary explorations of Mamba's context scaling.

Language:PythonStargazers:171Issues:0Issues:0

mamba

Mamba SSM architecture

Language:PythonLicense:Apache-2.0Stargazers:11324Issues:0Issues:0

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

Language:PythonLicense:MITStargazers:595Issues:0Issues:0

danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

Language:PythonLicense:NOASSERTIONStargazers:9698Issues:0Issues:0

recurrent-memory-transformer

[NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.

Language:Jupyter NotebookStargazers:746Issues:0Issues:0

compressive-transformer-pytorch

Pytorch implementation of Compressive Transformers, from Deepmind

Language:PythonLicense:MITStargazers:154Issues:0Issues:0

block-recurrent-transformer-pytorch

Implementation of Block Recurrent Transformer - Pytorch

Language:PythonLicense:MITStargazers:207Issues:0Issues:0
License:Apache-2.0Stargazers:403Issues:0Issues:0