Binoy Dalal's starred repositories
llm_distillation_playbook
Best practices for distilling large language models.
List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
cv-sentence-extractor
Scraping Wikipedia for fair use sentences
llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
interviews.ai
It is my belief that you, the postgraduate students and job-seekers for whom the book is primarily meant will benefit from reading it; however, it is my hope that even the most experienced researchers will find it fascinating as well.
120-Data-Science-Interview-Questions
Answers to 120 commonly asked data science interview questions.
docTTTTTquery
docTTTTTquery document expansion model
matchmaker
Training & evaluation library for text-based neural re-ranking and dense retrieval models built with PyTorch
CtCI-6th-Edition-Python
Cracking the Coding Interview 6th Ed. Python Solutions
onnx-simplifier
Simplify your onnx model
nlp-architect
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
pretrain-gnns
Strategies for Pre-training Graph Neural Networks
FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
langdetect
Port of Google's language-detection library to Python.
cs-video-courses
List of Computer Science courses with video lectures.
pytorch_geometric
Graph Neural Network Library for PyTorch
lw-k8s-workshop
Hands-on Labs for Fusion 5 Kubernetes Workshop