Future Data Systems (stanford-futuredata)

Future Data Systems

stanford-futuredata

Geek Repo

We are a CS research group building data-intensive systems

Location:Stanford, CA

Home Page:http://futuredata.stanford.edu/

Github PK Tool:Github PK Tool

Future Data Systems's repositories

ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)

Language:PythonLicense:MITStargazers:2686Issues:42Issues:252

macrobase

MacroBase: A Search Engine for Fast Data

Language:JavaLicense:Apache-2.0Stargazers:659Issues:55Issues:77
Language:PythonLicense:Apache-2.0Stargazers:372Issues:10Issues:21

FrugalGPT

FrugalGPT: better quality and lower cost for LLM applications

Language:PythonLicense:Apache-2.0Stargazers:150Issues:12Issues:3

FAST

End-to-end earthquake detection pipeline via efficient time series similarity search

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:144Issues:30Issues:20

gavel

Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020

Language:Jupyter NotebookLicense:MITStargazers:122Issues:11Issues:24
Language:PythonLicense:Apache-2.0Stargazers:71Issues:3Issues:5

sinkhorn-label-allocation

Sinkhorn Label Allocation is a label assignment method for semi-supervised self-training algorithms. The SLA algorithm is described in full in this ICML 2021 paper: https://arxiv.org/abs/2102.08622.

Language:PythonLicense:MITStargazers:53Issues:8Issues:1

Willump

Willump Is a Low-Latency Useful Machine learning Platform.

Language:PythonLicense:MITStargazers:43Issues:11Issues:2

Baleen

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Language:PythonLicense:MITStargazers:40Issues:13Issues:5

Uniserve

A runtime implementation of data-parallel actors.

Language:JavaLicense:MITStargazers:37Issues:9Issues:2

blazeit

Its BlazeIt because it's blazing fast

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:31Issues:2Issues:0

ACORN

state-of-the-art search over vector embeddings and structured data (SIGMOD '24)

Language:C++License:MITStargazers:24Issues:8Issues:0

POP

Code for "Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP", which appeared at SOSP 2021

Language:PythonLicense:MITStargazers:24Issues:7Issues:1
Language:PythonLicense:Apache-2.0Stargazers:20Issues:9Issues:0

loa

Public code for LOA

Language:PythonLicense:Apache-2.0Stargazers:18Issues:7Issues:2

tasti

Semantic Indexes for Machine Learning-based Queries over Unstructured Data (SIGMOD 2022)

cs245-as1

Student files for CS245 Programming Assignment 1: In-memory data layout

Language:JavaLicense:Apache-2.0Stargazers:12Issues:9Issues:0

InQuest

Accelerating Aggregation Queries on Unstructured Streams of Data

SparseJointShift

Model Performance Estimation and Explanation When Labels and A Few Features Shifts

Language:PythonStargazers:7Issues:9Issues:0

sketchstore

Algorithms for compressing and merging large collections of sketches

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:5Issues:9Issues:0
Language:C++License:Apache-2.0Stargazers:5Issues:8Issues:1
Language:PythonLicense:Apache-2.0Stargazers:5Issues:8Issues:2

abae

Accelerating Approximate Aggregation Queries with Expensive Predicates (VLDB 21)

ezmode

An iterative algorithm for selecting rare events in large, unlabeled datasets

Language:PythonStargazers:1Issues:7Issues:0

pop-ncflow

Code for POP (SOSP 2021) and NCFlow (NSDI 2021)

Language:Jupyter NotebookStargazers:1Issues:2Issues:0
Language:JuliaStargazers:0Issues:1Issues:0