Dean Wyatte (dwyatte)

dwyatte

Geek Repo

Company:@square

Location:Boulder, CO

Github PK Tool:Github PK Tool

Dean Wyatte's starred repositories

vidur

A large-scale simulation framework for LLM inference

Language:PythonLicense:MITStargazers:97Issues:0Issues:0

deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]

Language:PythonLicense:Apache-2.0Stargazers:385Issues:0Issues:0

infinity

The AI-native database built for LLM applications, providing incredibly fast full-text and vector search

Language:C++License:Apache-2.0Stargazers:1958Issues:0Issues:0

Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Language:ShellStargazers:4480Issues:0Issues:0

prometheus-eval

Evaluate your LLM's response with Prometheus 💯

Language:PythonLicense:Apache-2.0Stargazers:602Issues:0Issues:0

gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.

Language:PythonLicense:NOASSERTIONStargazers:550Issues:0Issues:0

OpenLineage

An Open Standard for lineage metadata collection

Language:JavaLicense:Apache-2.0Stargazers:1619Issues:0Issues:0

Consistency_LLM

[ICML 2024] CLLMs: Consistency Large Language Models

Language:PythonLicense:Apache-2.0Stargazers:295Issues:0Issues:0

TriForce

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Language:PythonStargazers:129Issues:0Issues:0

EAGLE

[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Language:PythonLicense:Apache-2.0Stargazers:545Issues:0Issues:0

aphrodite-engine

PygmalionAI's large-scale inference engine

Language:PythonLicense:AGPL-3.0Stargazers:682Issues:0Issues:0

mergekit

Tools for merging pretrained large language models.

Language:PythonLicense:LGPL-3.0Stargazers:3820Issues:0Issues:0

lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Language:PythonLicense:MITStargazers:438Issues:0Issues:0

JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Language:PythonLicense:Apache-2.0Stargazers:152Issues:0Issues:0

mergoo

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

Language:PythonLicense:LGPL-3.0Stargazers:339Issues:0Issues:0

JetMoE

Reaching LLaMA2 Performance with 0.1M Dollars

Language:PythonLicense:Apache-2.0Stargazers:935Issues:0Issues:0

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4223Issues:0Issues:0

submitit

Python 3.8+ toolbox for submitting jobs to Slurm

Language:PythonLicense:MITStargazers:1142Issues:0Issues:0

chronon

Chronon is a data platform for serving for AI/ML applications.

Language:ScalaLicense:Apache-2.0Stargazers:650Issues:0Issues:0

tensorrt_backend

The Triton backend for TensorRT.

Language:C++License:BSD-3-ClauseStargazers:51Issues:0Issues:0

optimum-benchmark

A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

Language:PythonLicense:Apache-2.0Stargazers:206Issues:0Issues:0

onnxruntime-genai

Generative AI extensions for onnxruntime

Language:C++License:MITStargazers:233Issues:0Issues:0

onnx-tensorrt

ONNX-TensorRT: TensorRT backend for ONNX

Language:C++License:Apache-2.0Stargazers:2799Issues:0Issues:0

nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:NOASSERTIONStargazers:217Issues:0Issues:0

lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Language:PythonLicense:Apache-2.0Stargazers:1728Issues:0Issues:0

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonLicense:MITStargazers:471Issues:0Issues:0

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Language:PythonLicense:MITStargazers:3103Issues:0Issues:0

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonLicense:MITStargazers:1335Issues:0Issues:0

functionary

Chat language model that can use tools and interpret the results

Language:PythonLicense:MITStargazers:1163Issues:0Issues:0

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Language:PythonLicense:Apache-2.0Stargazers:1195Issues:0Issues:0