Zach Nussbaum's starred repositories

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

shap

A game theoretic approach to explain the output of any machine learning model.

Language:Jupyter NotebookLicense:MITStargazers:21833Issues:240Issues:2461

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:14323Issues:105Issues:912

ml-engineering

Machine Learning Engineering Open Book

Language:PythonLicense:CC-BY-SA-4.0Stargazers:10012Issues:103Issues:18

GPU-Puzzles

Solve puzzles. Learn CUDA.

Language:Jupyter NotebookLicense:MITStargazers:5148Issues:28Issues:25

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Language:PythonLicense:MITStargazers:3343Issues:30Issues:248

rdkit

The official sources for the RDKit library

Language:HTMLLicense:BSD-3-ClauseStargazers:2469Issues:83Issues:3107

my_notes

My small cheatsheets for data science, ML, computer science and more.

nomic

Interact, analyze and structure massive text, image, embedding, audio and video datasets

dolma

Data and tools for generating and inspecting OLMo pre-training data.

Language:PythonLicense:Apache-2.0Stargazers:809Issues:17Issues:59

selfies

Robust representation of semantically constrained graphs, in particular for molecules in chemistry

Language:PythonLicense:Apache-2.0Stargazers:614Issues:24Issues:69

ama_prompting

Ask Me Anything language model prompting

Language:PythonLicense:Apache-2.0Stargazers:530Issues:24Issues:5

foldingdiff

Diffusion models of protein structure; trigonometry and attention are all you need!

Language:Jupyter NotebookLicense:MITStargazers:468Issues:17Issues:17

awesome-pretrain-on-molecules

[IJCAI 2023 survey track]A curated list of resources for chemical pre-trained models

datamol

Molecular Processing Made Easy.

Language:PythonLicense:Apache-2.0Stargazers:433Issues:17Issues:104

contrastors

Train Models Contrastively in Pytorch

Language:PythonLicense:Apache-2.0Stargazers:427Issues:12Issues:23

hyde

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

Language:Jupyter NotebookStargazers:375Issues:5Issues:6

electra_pytorch

Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)

openbiomechanics

The open source initiative for anonymized, elite-level athletic motion capture data. Run by Driveline Baseball.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:193Issues:13Issues:0

foundation-models-for-dbt-entity-matching

Playground for using large language models into the Modern Data Stack for entity matching

Language:PythonLicense:MITStargazers:103Issues:3Issues:0

FLIP

A collection of tasks to probe the effectiveness of protein sequence representations in modeling aspects of protein design

Language:Jupyter NotebookLicense:AFL-3.0Stargazers:84Issues:7Issues:10

DDPMs-Pytorch

Implementation of various DDPM papers to understand how they work

Language:PythonLicense:MITStargazers:76Issues:1Issues:3

PEER_Benchmark

PEER Benchmark, appear at NeurIPS 2022 Dataset and Benchmark Track (https://arxiv.org/abs/2206.02096)

Language:PythonLicense:Apache-2.0Stargazers:74Issues:5Issues:9

mt-metrics-eval

Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.

Language:PythonLicense:Apache-2.0Stargazers:70Issues:3Issues:11
Language:PythonStargazers:26Issues:0Issues:0

ProbTransformer

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

Language:PythonLicense:Apache-2.0Stargazers:16Issues:8Issues:2

awesome-ml-for-biochemistry

Community-curated resources for research at the intersection of AI and molecular sciences

License:CC0-1.0Stargazers:14Issues:2Issues:0

datasets

bio-datasets

Language:PythonStargazers:3Issues:0Issues:0