dwyatte

followers

following

stars

@square

Boulder, CO

Dean Wyatte's starred repositories

vidur

A large-scale simulation framework for LLM inference

Language:PythonMIT9700

deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]

Language:PythonApache-2.038500

infinity

The AI-native database built for LLM applications, providing incredibly fast full-text and vector search

Language:C++Apache-2.0195800

Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Language:Shell448000

prometheus-eval

Evaluate your LLM's response with Prometheus 💯

Language:PythonApache-2.060200

gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.

Language:PythonNOASSERTION55000

OpenLineage

An Open Standard for lineage metadata collection

Language:JavaApache-2.0161900

Consistency_LLM

[ICML 2024] CLLMs: Consistency Large Language Models

Language:PythonApache-2.029500

TriForce

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Language:Python12900

EAGLE

[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Language:PythonApache-2.054500

aphrodite-engine

PygmalionAI's large-scale inference engine

Language:PythonAGPL-3.068200

mergekit

Tools for merging pretrained large language models.

Language:PythonLGPL-3.0382000

lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Language:PythonMIT43800

JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Language:PythonApache-2.015200

mergoo

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

Language:PythonLGPL-3.033900

JetMoE

Reaching LLaMA2 Performance with 0.1M Dollars

Language:PythonApache-2.093500

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:Jupyter NotebookApache-2.0422300

submitit

Python 3.8+ toolbox for submitting jobs to Slurm

Language:PythonMIT114200

chronon

Chronon is a data platform for serving for AI/ML applications.

Language:ScalaApache-2.065000

tensorrt_backend

The Triton backend for TensorRT.

Language:C++BSD-3-Clause5100

optimum-benchmark

A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

Language:PythonApache-2.020600

onnxruntime-genai

Generative AI extensions for onnxruntime

Language:C++MIT23300

onnx-tensorrt

ONNX-TensorRT: TensorRT backend for ONNX

Language:C++Apache-2.0279900

nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonNOASSERTION21700

lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Language:PythonApache-2.0172800

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonMIT47100

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Language:PythonMIT310300

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonMIT133500

functionary

Chat language model that can use tools and interpret the results

Language:PythonMIT116300

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Language:PythonApache-2.0119500