yoreei / p2d2

A Transpiler Framework for Optimizing Data Science Pipelines. Published in DEEM@SIGMOD 18 June 2023. Forever grateful to my advisors at TU-Berlin for showing me the beauty computer science research.

Home Page:https://www.semanticscholar.org/paper/P2D%3A-A-Transpiler-Framework-for-Optimizing-Data-Grigorov-Gavriilidis/2926d28109772a00b4b7eba099746e1997c20daf?utm_source=direct_link

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

P2D2 Optimizer

Credits

Yordan Grigorov

Haralampos Gavriilidis

Sergey Redyuk

Presentation link:

https://docs.google.com/presentation/d/1ddOAj5-91dMmr7gBm28uBOANgcL1zlLOv41waXOuHt4/edit?usp=sharing

Thesis link (50 page academic work):

https://drive.google.com/file/d/1EusaxeYp8BGZFN4AVRxKpls9tO-Y2D-B/view?usp=sharing

DEEM paper

Coming soon!

Dir structure

grizzly

  • Scripts that prepare workflows for execution with grizzly
  • Experiments with grizzly

modin

  • Scripts that prepare workflows for execution with modin
  • Experiments with modin

p2d2

  • The product of the thesis
  • "Python PushDown to Database Management System"

papers

  • PDFs of papers that are/were referenced by the thesis
  • PDFs of papers that I find interesting and could be relevant for the thesis

wflows

  • Data science workflows adapted from Kaggle wflows/qgenenv - contains the qgen binary from TPC-H and its output (result.sql, result2.sql, result3.sql)

data

  • Data for the wflows and the scripts to import them. See comments in wflows to see what data is required. Or just import all.

tpch-kit

  • submodule containing gregrahn's fork of the TPC-H implementation. I had some difficulties compiling the original TPC-H and also this fork promises better PostgreSQL compatibility.

latex

  • The bachelor thesis

retired

  • The "trash bin". Contains code I might not want to delete just yet.

About

A Transpiler Framework for Optimizing Data Science Pipelines. Published in DEEM@SIGMOD 18 June 2023. Forever grateful to my advisors at TU-Berlin for showing me the beauty computer science research.

https://www.semanticscholar.org/paper/P2D%3A-A-Transpiler-Framework-for-Optimizing-Data-Grigorov-Gavriilidis/2926d28109772a00b4b7eba099746e1997c20daf?utm_source=direct_link


Languages

Language:Roff 94.1%Language:C 2.6%Language:Python 2.4%Language:Jupyter Notebook 0.6%Language:Shell 0.2%Language:Makefile 0.1%Language:Perl 0.1%Language:PLpgSQL 0.0%Language:Ruby 0.0%Language:Dockerfile 0.0%