Future Data Systems (stanford-futuredata)

Future Data Systems

stanford-futuredata

Geek Repo

We are a CS research group building data-intensive systems

Location:Stanford, CA

Home Page:http://futuredata.stanford.edu/

Github PK Tool:Github PK Tool

Future Data Systems's repositories

ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)

Language:PythonLicense:MITStargazers:2587Issues:41Issues:250

macrobase

MacroBase: A Search Engine for Fast Data

Language:JavaLicense:Apache-2.0Stargazers:658Issues:55Issues:77
Language:PythonLicense:Apache-2.0Stargazers:338Issues:10Issues:15

FAST

End-to-end earthquake detection pipeline via efficient time series similarity search

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:145Issues:30Issues:20

FrugalGPT

FrugalGPT: better quality and lower cost for LLM applications

Language:PythonLicense:Apache-2.0Stargazers:140Issues:12Issues:3

gavel

Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020

Language:Jupyter NotebookLicense:MITStargazers:120Issues:11Issues:24
Language:PythonLicense:Apache-2.0Stargazers:69Issues:3Issues:3

sinkhorn-label-allocation

Sinkhorn Label Allocation is a label assignment method for semi-supervised self-training algorithms. The SLA algorithm is described in full in this ICML 2021 paper: https://arxiv.org/abs/2102.08622.

Language:PythonLicense:MITStargazers:53Issues:8Issues:1

Willump

Willump Is a Low-Latency Useful Machine learning Platform.

Language:PythonLicense:MITStargazers:43Issues:12Issues:2

Baleen

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Language:PythonLicense:MITStargazers:40Issues:13Issues:5

Uniserve

A runtime implementation of data-parallel actors.

Language:JavaLicense:MITStargazers:37Issues:9Issues:2

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:31Issues:2Issues:0

blazeit

Its BlazeIt because it's blazing fast

POP

Code for "Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP", which appeared at SOSP 2021

Language:PythonLicense:MITStargazers:24Issues:7Issues:1
Language:PythonLicense:Apache-2.0Stargazers:20Issues:9Issues:0

loa

Public code for LOA

Language:PythonLicense:Apache-2.0Stargazers:18Issues:7Issues:2

tasti

Semantic Indexes for Machine Learning-based Queries over Unstructured Data (SIGMOD 2022)

cs245-as1

Student files for CS245 Programming Assignment 1: In-memory data layout

Language:JavaLicense:Apache-2.0Stargazers:12Issues:9Issues:0

InQuest

Accelerating Aggregation Queries on Unstructured Streams of Data

SparseJointShift

Model Performance Estimation and Explanation When Labels and A Few Features Shifts

Language:PythonStargazers:7Issues:9Issues:0

sketchstore

Algorithms for compressing and merging large collections of sketches

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:5Issues:9Issues:0
Language:C++License:Apache-2.0Stargazers:5Issues:8Issues:1
Language:PythonLicense:Apache-2.0Stargazers:5Issues:8Issues:2

abae

Accelerating Approximate Aggregation Queries with Expensive Predicates (VLDB 21)

ezmode

An iterative algorithm for selecting rare events in large, unlabeled datasets

Language:PythonStargazers:1Issues:7Issues:0

pop-ncflow

Code for POP (SOSP 2021) and NCFlow (NSDI 2021)

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

redisgeo-bench

Simple benchmark for Redis geosets for top-k queries.

Language:RustStargazers:0Issues:9Issues:0
Language:JuliaStargazers:0Issues:1Issues:0