Hoagy's starred repositories

CyberChef

The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis

Language:JavaScriptLicense:Apache-2.0Stargazers:25936Issues:376Issues:921

marker

Convert PDF to markdown quickly with high accuracy

Language:PythonLicense:GPL-3.0Stargazers:9067Issues:46Issues:95

sharedrop

Easy P2P file transfer powered by WebRTC - inspired by Apple AirDrop

Language:JavaScriptLicense:MITStargazers:8530Issues:103Issues:155

vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.

Language:PythonLicense:MITStargazers:7395Issues:45Issues:214

surya

OCR, layout analysis, reading order, line detection in 90+ languages

Language:PythonLicense:GPL-3.0Stargazers:7051Issues:64Issues:71

AlphaCodium

Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

Language:PythonLicense:AGPL-3.0Stargazers:3163Issues:48Issues:15

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonLicense:Apache-2.0Stargazers:2495Issues:30Issues:228

Vim

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Language:PythonLicense:MITStargazers:2258Issues:30Issues:24

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonLicense:Apache-2.0Stargazers:1397Issues:39Issues:61

self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

Language:PythonLicense:MITStargazers:1254Issues:23Issues:17

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonLicense:Apache-2.0Stargazers:824Issues:40Issues:54

tensordict

TensorDict is a pytorch dedicated tensor container.

Language:PythonLicense:MITStargazers:606Issues:27Issues:87

HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Language:PythonLicense:Apache-2.0Stargazers:583Issues:6Issues:18

wanda

A simple and effective LLM pruning approach.

Language:PythonLicense:MITStargazers:538Issues:9Issues:50

EAGLE

[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Language:PythonLicense:Apache-2.0Stargazers:524Issues:10Issues:61

SwiftInfer

Efficient AI Inference & Serving

Language:PythonLicense:Apache-2.0Stargazers:437Issues:5Issues:6

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonLicense:Apache-2.0Stargazers:374Issues:13Issues:19

ALMA

State-of-the-art LLM-based translation models.

Language:RubyLicense:MITStargazers:320Issues:10Issues:32

xmc.dspy

In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.

Language:PythonLicense:MITStargazers:313Issues:23Issues:8

openlogprobs

Extract full next-token probabilities via language model APIs

laserRMT

This is our own implementation of 'Layer Selective Rank Reduction'

Language:PythonLicense:Apache-2.0Stargazers:214Issues:10Issues:7

SeeClick

The model, data and code for the visual GUI Agent SeeClick

Language:HTMLLicense:Apache-2.0Stargazers:115Issues:1Issues:31
Language:PythonLicense:MITStargazers:104Issues:4Issues:5

genalog

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Language:Jupyter NotebookLicense:MITStargazers:42Issues:2Issues:0

ClusterLLM

LLM guided text clustering

LLM-MCQ-Bias

Official repository for ICLR 2024 Spotlight paper "Large Language Models Are Not Robust Multiple Choice Selectors"

Language:PythonStargazers:20Issues:0Issues:0

EnsembleForecasting

Using multiple LLMs for ensemble Forecasting

Language:Jupyter NotebookStargazers:16Issues:0Issues:0

BiTA

An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.

Language:PythonLicense:Apache-2.0Stargazers:11Issues:0Issues:0
Language:PythonLicense:MITStargazers:8Issues:0Issues:0