Vladimir Gurevich's starred repositories

pylate

Late Interaction Models Training & Retrieval

Language:PythonLicense:MITStargazers:142Issues:0Issues:0

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonLicense:BSD-2-ClauseStargazers:3114Issues:0Issues:0

relik

Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)

Language:PythonStargazers:297Issues:0Issues:0

outlines

Structured Text Generation

Language:PythonLicense:Apache-2.0Stargazers:8405Issues:0Issues:0

Bend

A massively parallel, high-level programming language

Language:RustLicense:Apache-2.0Stargazers:17266Issues:0Issues:0

transformer-heads

Toolkit for attaching, training, saving and loading of new heads for transformer models

Language:Jupyter NotebookLicense:MITStargazers:238Issues:0Issues:0

neural-tree

Tree-based indexes for neural-search

Language:PythonLicense:MITStargazers:28Issues:0Issues:0

SONAR

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Language:PythonLicense:NOASSERTIONStargazers:326Issues:0Issues:0

transformers-CFG

🤗 A specialized library for integrating context-free grammars (CFG) in EBNF with the Hugging Face Transformers

Language:PythonLicense:MITStargazers:83Issues:0Issues:0

Filtered-Semi-Markov-CRF

Code for our paper accepted at EMNLP 2023 (Findings)

Language:PythonStargazers:12Issues:0Issues:0

whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language:CLicense:MITStargazers:34793Issues:0Issues:0

MedTator

A Serverless Text Annotation Tool for Corpus Development

Language:JavaScriptLicense:Apache-2.0Stargazers:51Issues:0Issues:0

focus

[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"

Language:PythonLicense:MITStargazers:27Issues:0Issues:0

distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Language:PythonLicense:MITStargazers:3545Issues:0Issues:0

MEGA

Multilingual Evaluation of LLMs

License:MITStargazers:6Issues:0Issues:0
Language:Jupyter NotebookLicense:NOASSERTIONStargazers:101Issues:0Issues:0

voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.

Language:C++License:Apache-2.0Stargazers:1286Issues:0Issues:0

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonLicense:MITStargazers:1766Issues:0Issues:0

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonLicense:Apache-2.0Stargazers:4534Issues:0Issues:0

DiffusionLLM

Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"

Language:PythonStargazers:60Issues:0Issues:0

dspy

DSPy: The framework for programming—not prompting—foundation models

Language:PythonLicense:MITStargazers:17451Issues:0Issues:0

prompt2model

prompt2model - Generate Deployable Models from Natural Language Instructions

Language:PythonLicense:Apache-2.0Stargazers:1946Issues:0Issues:0

usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

Language:C++License:Apache-2.0Stargazers:2148Issues:0Issues:0

riveter-nlp

Package to extract connotation frames

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:79Issues:0Issues:0

charred

CHARacter-awaRE Diffusion: Multilingual Character-Aware Encoders for Font-Aware Diffusers That Can Actually Spell

Language:PythonStargazers:14Issues:0Issues:0

webie

Dataset for web-scaled information extraction.

Language:PythonLicense:NOASSERTIONStargazers:8Issues:0Issues:0
Language:C++License:AGPL-3.0Stargazers:115Issues:0Issues:0
License:NOASSERTIONStargazers:2Issues:0Issues:0
Language:PythonLicense:GPL-3.0Stargazers:15Issues:0Issues:0

Hebrew-Question-Answering-Dataset

A question answering dataset in Modern Hebrew, containing 30,147 questions.

Language:Jupyter NotebookLicense:CC-BY-4.0Stargazers:18Issues:0Issues:0