idearendil / DBSherlock

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DBSherlock (Python)

This is a Python implementation of DBSherlock: A Performance Diagnostic Tool for Transactional Databases (SIGMOD 2016)

Environment Setup

Start docker container using docker compose, and login to the container

docker compose up -d

Install python packages

pip install -r requirements.txt

Prepare Dataset

You will need to download DBSherlock dataset and convert it to json format.

Download DBSherlock Dataset

Download TPCC 16w dataset

wget -P data/original_dataset/ https://github.com/dongyoungy/dbsherlock-reproducibility/raw/master/datasets/dbsherlock_dataset_tpcc_16w.mat

Download TPCC 500w dataset

wget -P data/original_dataset/ https://github.com/dongyoungy/dbsherlock-reproducibility/raw/master/datasets/dbsherlock_dataset_tpcc_500w.mat

Download TPCE 3000 dataset

wget -P data/original_dataset/ https://github.com/dongyoungy/dbsherlock-reproducibility/raw/master/datasets/dbsherlock_dataset_tpce_3000.mat

Data Convertion

Convert TPCC 16w dataset to json format

python scripts/data/convert_dataset.py \
--input data/original_dataset/dbsherlock_dataset_tpcc_16w.mat \
--out_dir data/converted_dataset \
--prefix tpcc_16w

Convert TPCC 500w dataset to json format

python scripts/data/convert_dataset.py \
--input data/original_dataset/dbsherlock_dataset_tpcc_500w.mat \
--out_dir data/converted_dataset \
--prefix tpcc_500w

Convert TPCE 3000 dataset to json format

python scripts/data/convert_dataset.py \
--input data/original_dataset/dbsherlock_dataset_tpce_3000.mat \
--out_dir data/converted_dataset \
--prefix tpce_3000

How to load the dataset in Python

Please refer to src/data/README.md

Visualize Dataset

python scripts/visualize/data.py \
--data data/converted_dataset/tpcc_500w_test.json \
--output results/visualize_data/

The saved time series plots will look like this:

plot

Run Experiments

Experiment 1

Accuracy of Single Causal Models (Figure 7 in the paper)

python scripts/experiments/experiment.py \
--data data/converted_dataset/tpcc_500w_test.json \
--output_dir result/exp1/ \
--exp_id 1

The result plot should look like this:

plot

Experiment 2

DBSherlock Predicates versus PerfXplain (Figure 9 in the paper)

python scripts/experiments/experiment.py \
--data data/converted_dataset/tpcc_16w_test.json \
--output_dir result/exp2/ \
--exp_id 2

The result plot should look like this:

plot

Experiment 3

Effectiveness of Merged Causal Models (Figure 8 in the paper)

python scripts/experiments/experiment.py \
--data data/converted_dataset/tpcc_500w_test.json \
--output_dir result/exp3/ \
--exp_id 3

The result plot should look like this:

plot

About

License:Apache License 2.0


Languages

Language:Python 91.8%Language:Jupyter Notebook 5.8%Language:Dockerfile 2.4%