stefan-grafberger

followers

following

stars

University of Amsterdam

Amsterdam

https://stefan-grafberger.com

Stefan Grafberger's repositories

mlinspect

Inspect ML Pipelines in Python in the form of a DAG

Language:PythonApache-2.068 5 52

mlwhatif

Data-Centric What-If Analysis for Native Machine Learning Pipelines

Language:Jupyter NotebookApache-2.014 3 23

StreamDQ

StreamDQ is a library built on top of Apache Flink for defining "unit tests for data", which measure data quality in large data streams.

Language:KotlinApache-2.01000

mlinspect-cidr

Inspect ML Pipelines in Python in the form of a DAG (CIDR Submission version)

Language:PythonApache-2.05 1 1

deem22-what-if-experiments

Language:Jupyter NotebookGPL-3.01 30

csvmatch

🔎 Finds fuzzy matches between CSV spreadsheets

Language:PythonNOASSERTION000

datawig

Imputation of missing values in tables.

Language:PythonApache-2.0010

dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Language:PythonMIT000

deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Language:ScalaApache-2.0010

duckdq

Language:PythonMIT000

hackathon-2021-1

Rust Rust Rust!

Language:Rust000

latex-make-action

Action for compiling latex with make

Language:Makefile000

learnedcardinalities

Code and workloads from the Learned Cardinalities paper (https://arxiv.org/abs/1809.00677)

Language:PythonMIT000

jenga

Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.

Language:Jupyter NotebookGPL-3.0000

ml-pipeline-datasets

Some datasets for ML pipelines that I want to use for some experiments

Language:Jupyter Notebook010

mlinspect-demo

Apache-2.0000

mlinspect-exploratory-user-study

The files for an initial exploratory user study. It provides the foundation for a larger user study in future work.

Language:Jupyter Notebook000

noworkflow

Supporting infrastructure to run scientific experiments without a scientific workflow management system.

Language:Jupyter NotebookMIT000

pgbm

Probabilistic Gradient Boosting Machines

Language:PythonApache-2.0000

plantestic

Language:KotlinApache-2.0000

shadow-pipeline-experiments

Language:PythonApache-2.0000

st-cytoscape

A Fork to add dagre layout support

Language:PythonMIT000

uni-project-code

Language:Kotlin000