OhadRubin's starred repositories

newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Language:PythonLicense:MITStargazers:13791Issues:386Issues:692

annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Language:C++License:Apache-2.0Stargazers:12772Issues:317Issues:394

ml-engineering

Machine Learning Engineering Open Book

Language:PythonLicense:CC-BY-SA-4.0Stargazers:9935Issues:100Issues:18

awesome-langchain

šŸ˜Ž Awesome list of tools and projects with the awesome LangChain framework

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonLicense:MITStargazers:3497Issues:47Issues:168

scenic

Scenic: A Jax Library for Computer Vision Research and Beyond

Language:PythonLicense:Apache-2.0Stargazers:3036Issues:39Issues:229

scikit-llm

Seamlessly integrate LLMs into scikit-learn.

Language:PythonLicense:MITStargazers:2931Issues:41Issues:48

BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Language:PythonLicense:Apache-2.0Stargazers:2692Issues:48Issues:149

datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Language:PythonLicense:MITStargazers:2365Issues:48Issues:162

LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

make-real-starter

Make it real

Language:TypeScriptLicense:AGPL-3.0Stargazers:1385Issues:12Issues:12

coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset

jaxopt

Hardware accelerated, batchable and differentiable optimizers in JAX.

Language:PythonLicense:Apache-2.0Stargazers:894Issues:19Issues:199

bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Language:PythonLicense:Apache-2.0Stargazers:654Issues:13Issues:116

wmd

Word Mover's Distance from Matthew J Kusner's paper "From Word Embeddings to Document Distances"

elia

A snappy, keyboard-centric terminal user interface for interacting with large language models. Chat with ChatGPT, Claude, Llama 3, Phi 3, Mistral, Gemma and more.

Language:PythonLicense:Apache-2.0Stargazers:306Issues:18Issues:40

dpr-scale

Scalable training for dense retrieval models.

cookbook

Deep learning for dummies. All the practical details and useful utilities that go into working with real models.

Language:PythonLicense:Apache-2.0Stargazers:183Issues:6Issues:10

EasyDeL

Accelerate your training with this open-source library. Optimize performance with streamlined training and serving options with JAX. šŸš€

Language:PythonLicense:Apache-2.0Stargazers:159Issues:9Issues:74

zero

Zero MQ made easy with a few wrappers around pyzmq

Language:PythonLicense:MITStargazers:113Issues:0Issues:0

DuckTrack

Multimodal computer agent data collection program

Language:PythonLicense:MITStargazers:96Issues:3Issues:8
Language:PythonLicense:Apache-2.0Stargazers:89Issues:7Issues:3

icl-ceil

[ICML 2023] Code for our paper ā€œCompositional Exemplars for In-context Learningā€.

Language:PythonLicense:Apache-2.0Stargazers:87Issues:4Issues:5

ezmup

Simple implementation of muP, based on Spectral Condition for Feature Learning

easy-elasticsearch

Using business-level retrieval system (BM25) with Python in just a few lines.

Language:PythonLicense:Apache-2.0Stargazers:30Issues:1Issues:0

UDR

ACL'23: Unified Demonstration Retriever for In-Context Learning

pile_dedupe

Pile Deduplication Code

Language:PythonLicense:MITStargazers:14Issues:1Issues:1

multihost_dataloading

Experimenting with how best to do multi-host dataloading

Language:PythonStargazers:5Issues:0Issues:0

Hax-LLM

Hastur's experiments in scaling LLM to 10B+ parameters with JAX and TPUs

Language:PythonLicense:MITStargazers:4Issues:2Issues:0