Vladimir Gurevich's starred repositories

whisper.cpp

Port of OpenAI's Whisper model in C/C++

dspy

DSPy: The framework for programming—not prompting—foundation models

Language:PythonLicense:MITStargazers:17451Issues:143Issues:744

Bend

A massively parallel, high-level programming language

Language:RustLicense:Apache-2.0Stargazers:17267Issues:91Issues:251

outlines

Structured Text Generation

Language:PythonLicense:Apache-2.0Stargazers:8187Issues:47Issues:553

ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines

Language:PythonLicense:Apache-2.0Stargazers:6767Issues:36Issues:744

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonLicense:Apache-2.0Stargazers:4533Issues:108Issues:134

distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Language:PythonLicense:MITStargazers:3545Issues:65Issues:103

usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

Language:C++License:Apache-2.0Stargazers:2148Issues:26Issues:148

prompt2model

prompt2model - Generate Deployable Models from Natural Language Instructions

Language:PythonLicense:Apache-2.0Stargazers:1946Issues:25Issues:168

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonLicense:MITStargazers:1765Issues:18Issues:80

voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.

Language:C++License:Apache-2.0Stargazers:1286Issues:13Issues:28

SONAR

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Language:PythonLicense:NOASSERTIONStargazers:326Issues:14Issues:19

relik

Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)

stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

Language:PythonLicense:MITStargazers:247Issues:20Issues:40

transformer-heads

Toolkit for attaching, training, saving and loading of new heads for transformer models

Language:Jupyter NotebookLicense:MITStargazers:238Issues:5Issues:8
Language:Jupyter NotebookLicense:NOASSERTIONStargazers:101Issues:6Issues:9

transformers-CFG

🤗 A specialized library for integrating context-free grammars (CFG) in EBNF with the Hugging Face Transformers

Language:PythonLicense:MITStargazers:83Issues:3Issues:37

riveter-nlp

Package to extract connotation frames

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:78Issues:7Issues:6

DiffusionLLM

Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"

MedTator

A Serverless Text Annotation Tool for Corpus Development

Language:JavaScriptLicense:Apache-2.0Stargazers:51Issues:4Issues:12

neural-tree

Tree-based indexes for neural-search

Language:PythonLicense:MITStargazers:28Issues:2Issues:0

focus

[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"

Language:PythonLicense:MITStargazers:27Issues:1Issues:1

Hebrew-Question-Answering-Dataset

A question answering dataset in Modern Hebrew, containing 30,147 questions.

Language:Jupyter NotebookLicense:CC-BY-4.0Stargazers:18Issues:4Issues:1
Language:PythonLicense:GPL-3.0Stargazers:15Issues:5Issues:2

charred

CHARacter-awaRE Diffusion: Multilingual Character-Aware Encoders for Font-Aware Diffusers That Can Actually Spell

Language:PythonStargazers:14Issues:4Issues:0

Filtered-Semi-Markov-CRF

Code for our paper accepted at EMNLP 2023 (Findings)

Language:PythonStargazers:12Issues:2Issues:0

webie

Dataset for web-scaled information extraction.

Language:PythonLicense:NOASSERTIONStargazers:8Issues:4Issues:0

MEGA

Multilingual Evaluation of LLMs

License:MITStargazers:6Issues:2Issues:0
License:NOASSERTIONStargazers:2Issues:0Issues:0