YiBoWang20 / learned-cardinalities

learned cardinalities for databases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Testbed for Cardinality Estimation

Setup

A docker image to set up postgres. We can update the init script in ./docker to set up the appropriate DB, and supply the relevant authentication args to main.py. For now, I just create a dummy DB and populate it with synthetic data.

$ cd docker
$ sudo docker build -t card-est
$ sudo docker run --name card-db -p 5401:5432 -d card-est

Now, you can connect to the db with:

$ psql -U card_est -h localhost -p 5401 -d card_est

Steps

  • Create new DB + generate synthetic data OR set up a pre-existing DB
  • Generate interesting cardinality queries (see cardinality_estimation/db_utils/Query) using templates defined in ./test_templates. These queries can be over a single table or multiple tables.
  • Maybe, generate all possible SubQueries from each Query (only relevant for multi-table joins)
  • Implement an algorithm as a subclass of cardinality_estimation/algs/CardinalityEstimationAlg. The train method can use a bunch of the CardinalitySamples in the training set (query-feedback based systems, e.g., quicksel, neural nets etc.) or ignore those, and only use the data in the underlying table (wavelet based classifier, pgm etc.) or combine both of these.
  • Compare performance on the test set etc.

About

learned cardinalities for databases


Languages

Language:Jupyter Notebook 98.4%Language:Python 1.5%Language:C++ 0.1%Language:Shell 0.1%Language:Makefile 0.0%Language:Dockerfile 0.0%