OhadRubin's starred repositories
ml-engineering
Machine Learning Engineering Open Book
awesome-langchain
š Awesome list of tools and projects with the awesome LangChain framework
open_flamingo
An open-source framework for training large multimodal models.
scikit-llm
Seamlessly integrate LLMs into scikit-learn.
datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
make-real-starter
Make it real
coyo-dataset
COYO-700M: Large-scale Image-Text Pair Dataset
bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
easy-elasticsearch
Using business-level retrieval system (BM25) with Python in just a few lines.
pile_dedupe
Pile Deduplication Code
multihost_dataloading
Experimenting with how best to do multi-host dataloading