Matt Jordan (revbucket)

revbucket

Geek Repo

Location:Seattle, WA

Github PK Tool:Github PK Tool

Matt Jordan's repositories

geometric-certificates

Geometric Certifications of Neural Nets

lipMIP

Mixed integer programming for computing lipschitz constants of ReLU Networks

Language:Jupyter NotebookStargazers:17Issues:2Issues:1
Language:RustLicense:Apache-2.0Stargazers:3Issues:1Issues:0

minhash-rs

Minhashing done in rust

Language:RustStargazers:2Issues:1Issues:0

Contrastive-Inversion

Using contrastive learning and OpenAI's CLIP to find good embeddings for images with lossy transformations

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

open_lm

A repository for research on medium sized language models.

Language:PythonLicense:MITStargazers:1Issues:0Issues:0

pytorch_unbg

Removes backgrounds for pytorch settings

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

tokshuf-rust

Tokenize/Shuffle tooling written in Rust

Language:RustLicense:MITStargazers:1Issues:1Issues:0

utcs_site

https://www.cs.utexas.edu/~mjordan/ html

Language:HTMLStargazers:1Issues:2Issues:0

bit-diffusion

Implementation of Bit Diffusion, Hinton's group's attempt at discrete denoising diffusion, in Pytorch

License:MITStargazers:0Issues:0Issues:0

deduplicate-text-datasets

for decontamination

Language:RustLicense:Apache-2.0Stargazers:0Issues:0Issues:0

docshuffle-rs

Uses the local-cell mapper pattern to fully shuffle a collection of jsonl documents in rust

Language:RustStargazers:0Issues:0Issues:0

fastargs

Python library for argument and configuration management

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

parquet-hf-rs

Converts zstd jsonls to parquets in rust

Language:RustStargazers:0Issues:0Issues:0

ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

reservoir-datastats-rs

Multithreaded reservoir sampling for doc-length (also counts tokens globally :D)

Language:RustStargazers:0Issues:0Issues:0
Language:RustStargazers:0Issues:0Issues:0

rust-exact-dedup

Exact deduplication with rust and option to count presence

Language:RustStargazers:0Issues:0Issues:0
Language:RustStargazers:0Issues:0Issues:0
Language:RustStargazers:0Issues:0Issues:0

sa_decontamination

Suffix Array based decontamination tools

Language:RustStargazers:0Issues:1Issues:0

swav-cifar100

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

text-subsample-rs

Methods for subsampling text datasets (with emphasis on "duplicate aware subsampling")

Language:RustStargazers:0Issues:1Issues:0

token-counter-rs

Simple rust utility to count tokens from tarfiles of contexts

Language:RustStargazers:0Issues:1Issues:0

wimbd

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0