Ben Batorsky's starred repositories

professional-programming

A collection of learning resources for curious software engineers

Language:PythonLicense:MITStargazers:45977Issues:984Issues:28

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Language:Jupyter NotebookLicense:MITStargazers:26524Issues:1370Issues:245

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonLicense:Apache-2.0Stargazers:9011Issues:87Issues:699

stanford-cs-230-deep-learning

VIP cheatsheets for Stanford's CS 230 Deep Learning

Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

Language:JavaScriptLicense:Apache-2.0Stargazers:5724Issues:81Issues:163

marvin

✨ Build AI interfaces that spark joy

Language:PythonLicense:Apache-2.0Stargazers:5026Issues:36Issues:202

osmnx

OSMnx is a Python package to easily download, model, analyze, and visualize street networks and other geospatial features from OpenStreetMap.

Language:PythonLicense:MITStargazers:4762Issues:114Issues:651

llm-numbers

Numbers every LLM developer should know

Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.

Language:PythonLicense:BSD-3-ClauseStargazers:2898Issues:38Issues:327

awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Language:PythonLicense:GPL-3.0Stargazers:2203Issues:77Issues:19

tabula-py

Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame

Language:PythonLicense:MITStargazers:2119Issues:46Issues:279

ecco

Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:1943Issues:24Issues:63

pymc-resources

PyMC educational resources

Language:Jupyter NotebookLicense:MITStargazers:1908Issues:65Issues:75

nlp-library

curated collection of papers for the nlp practitioner 📖👩‍🔬

Local-LLM-Comparison-Colab-UI

Compare the performance of different LLM that can be deployed locally on consumer hardware. Run yourself with Colab WebUI.

Language:Jupyter NotebookStargazers:933Issues:27Issues:10

wtte-rnn

WTTE-RNN a framework for churn and time to event prediction

Language:PythonLicense:MITStargazers:762Issues:43Issues:62

transformers

A collection of resources to study Transformers in depth.

datamapplot

Creating beautiful plots of data maps

Language:PythonLicense:MITStargazers:452Issues:11Issues:17

snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch

Language:JavaLicense:NOASSERTIONStargazers:184Issues:25Issues:383

spacy-udpipe

spaCy + UDPipe

Language:PythonLicense:MITStargazers:159Issues:11Issues:26

crash-model

Build a crash prediction modeling application that leverages multiple data sources to generate a set of dynamic predictions we can use to identify potential trouble spots and direct timely safety interventions.

Language:Jupyter NotebookLicense:MITStargazers:113Issues:27Issues:117

COVID-19

Gathering weather data in locations with confirmed COVID19 diagnoses. Confirmed diagnoses are from JHU data

Language:Jupyter NotebookStargazers:65Issues:3Issues:6

data-inventories

A simple script to look for and process all the federal data.json data inventories.

Language:JavaScriptStargazers:46Issues:15Issues:0

Benchmarking_past_present_future

Workshop Home Page for Benchmarking: Past, Present and Future

word2vec

Word2Vec in Python, using Tensorflow.

Language:Jupyter NotebookStargazers:30Issues:3Issues:0

geolocChina

Native Geolocation of Chinese Strings in R (no API keys required)

pieces

tech infra components to assist with mutual aid projects

Language:JavaScriptStargazers:2Issues:1Issues:0