Future Data Systems

Sinkhorn Label Allocation is a label assignment method for semi-supervised self-training algorithms. The SLA algorithm is described in full in this ICML 2021 paper: https://arxiv.org/abs/2102.08622.

Language:PythonMIT53 8 1

Willump

Willump Is a Low-Latency Useful Machine learning Platform.

Language:PythonMIT43 11 2

Baleen

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Language:PythonMIT40 13 5

Uniserve

A runtime implementation of data-parallel actors.

Language:JavaMIT37 9 2

blazeit

Its BlazeIt because it's blazing fast

Language:C++31 13 6

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION31 20

ACORN

state-of-the-art search over vector embeddings and structured data (SIGMOD '24)

Language:C++MIT24 80

POP

Code for "Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP", which appeared at SOSP 2021

Language:PythonMIT24 7 1

omg

Language:PythonApache-2.020 90

loa

Public code for LOA

Language:PythonApache-2.018 7 2

tasti

Semantic Indexes for Machine Learning-based Queries over Unstructured Data (SIGMOD 2022)

Language:Python13 8 3

cs245-as1

Student files for CS245 Programming Assignment 1: In-memory data layout

Language:JavaApache-2.012 90

cs245-as2-public

Language:Scala8 90

InQuest

Accelerating Aggregation Queries on Unstructured Streams of Data

Language:Python7 8 4

SparseJointShift

Model Performance Estimation and Explanation When Labels and A Few Features Shifts

Language:Python7 90

sketchstore

Algorithms for compressing and merging large collections of sketches

Language:Jupyter NotebookApache-2.05 90

smol

Language:C++Apache-2.05 8 1

supg

Language:PythonApache-2.05 8 2

parallel-lb-simulator

Language:Java4 70

abae

Accelerating Approximate Aggregation Queries with Expensive Predicates (VLDB 21)

Language:Python3 8 1

ezmode

An iterative algorithm for selecting rare events in large, unlabeled datasets

Language:Python1 70

pop-ncflow

Code for POP (SOSP 2021) and NCFlow (NSDI 2021)

Language:Jupyter Notebook1 20

teavar

Language:Julia010