This is the implementation of COOOL. In this repository, we show the saved models as well as the raw data for all experiments involved in the paper.
- Install PostgreSQL version 12.5
# Download link
wget https://ftp.postgresql.org/pub/source/v12.5/postgresql-12.5.tar.gz
-
The configuration of PostgreSQL is shown in 'postgresql.conf', which is tuned by PGTune.
-
Prepare datasets and workloads.
Due to license issues, we cannot provide queries in this repository. All queries of the two datasets should be put under db_benchmark_datasets/
folder. Refer to https://github.com/gregrahn/join-order-benchmark
and https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
for join-order-benchmark and TPC-H, respectively. We omit the details of how to load data into PostgreSQL.
- Execute queries with each hint set and record the experience.
- Change the dbname and user in
src/db_util.py
with your own settings. - Execute
python split15.py
to split CREATE VIEW and SELECT operation in TPC-H template #15, because it is not applicable to directly get the plan of the query with operation 'CREATE VIEW'. - Run
python execute_queries.py --dataset='join-order-benchmark'
to execute and record the execution experimence of join-order-benchmark. - Run
python execute_queries.py --dataset='TPCH'
andpython execute_15.py
to execute and record the execution experimence of TPC-H.
- Change the dbname and user in
In order to facilitate reproduction, we have stored the results of execution on our machine under record/postgresql/{dataset}/execute_sql/
.
- Python 3.8 with the following packages
torch==1.12.0 (w/ cuda, refer to https://download.pytorch.org/whl/torch/)
sklearn==0.0.post1
scipy==1.9.3
scikit-learn==1.2.0
psycopg2-binary==2.9.5
pandas==1.5.2
numpy==1.22.0
The raw data (training logs) are shown in the corresponding folders under record/postgresql
. We run each experiments 10 times, so all experiments have 10 saved models. To obtain the repored results in the paper, we can directly analyze from the raw data.
python ana_single.py # the single instance experiments
python ana_cross.py # cross workload experiments
python ana_unione.py # unified model experiments
It is also available for running the inference stages of all models.
bash inference.sh
This script will generate the inference logs of each experimental settings by execute the saved models. There are 60 experimental settings, so there are 60 inference logs and 600 saved models will be executed. We use some examples to illustrate the meaning of the arguments, the commands of all experiments are shown in inference.sh
# inference in the single instance setting
python3 -u run.py
--exp='single' # ['single', 'cross', 'one'], three experimental settings
--app='pair' # ['Bao', 'pair', 'list'], three approaches involved in the paper
--grain='adhoc' # ['adhoc', 'repeat'], two experimental scenarios
--dataset='TPCH' # ['join-order-benchmark', 'TPCH'], two datasets
--slow=1 # [0, 1], slow split or not
# inference in the cross workload setting
python3 -u run.py
--exp='cross' # cross workload setting
--src='TPCH' # source workload
--tgt='JOB' # target workload, 'join-order-benchmark' and 'JOB' are equivalent
--app='pair' # the pairwise approach
--grain='adhoc' # adhoc scenario
--slow=1 # slow split
# inference in the unified model setting
python3 -u run.py
--exp='one' # unified model setting
--app='pair' # the pairwise approach
--grain='adhoc' # adhoc scenario
--slow=1 # slow split