DBA Bandits

We propose a self-driving approach to online index selection that eschews the DBA and query optimiser, and instead learns the benefits of viable structures through strategic exploration and direct performance observation.

Cite: R Malinga Perera, Bastian Oetomo, Benjamin IP Rubinstein, and Renata Borovica-Gajic. DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pages 600–611. IEEE, 2021.

How to run

Setting up the environment
1. You need to generate data for your benchmark
2. You need to generate queries for you benchmark
3. You need to create a workload file for your benchmark (example workload files can be found in resources/workloads folder)
4. Notice that we have included the predicates and payload of those queries in the workload file
5. Add DB connection details to config/db.conf
Setting up your experiment. Our framework allows you easily setup experiments in config/exp.conf
1. See the examples in config/exp.conf
2. Check the explanation under 'Experiment Config Explained' below
Program starting point is simulation/sim_run_experiment.py. This is the file that you will be running. Simply add a list of experiments you need to run and execute the file
Results
1. Results will be saved under in a sub-folder under experiments folder. Sub-folder name will be the name you provided for the experiment.
2. Results will include graphs, CSV of main results, pickle fill of all the data.

Experiment Config Explained

Example

[tpc_h_static_10_MAB]
reps = 1
rounds = 25
hyp_rounds = 0
workload_shifts = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
queries_start = [0, 21, 42, 63, 84, 105, 126, 147, 168, 189, 210, 231, 252, 273, 294, 315, 336, 357, 378, 399, 420, 441, 462, 483, 504]
queries_end = [21, 42, 63, 84, 105, 126, 147, 168, 189, 210, 231, 252, 273, 294, 315, 336, 357, 378, 399, 420, 441, 462, 483, 504, 525]
ta_runs = [1]
ta_workload = optimal
workload_file = \resources\workloads\tpc_h_static_100.json
config_shifts = [0]
config_start = [0]
config_end = [20]
max_memory = 25000
input_alpha = 1
input_lambda = 0.5
time_weight = 5
memory_weight = 0
components = ["MAB"]
mab_versions = ["simulation.sim_c3ucb_vR"]

Explanation

Name of the experiment
Number of times we run the experiment
Number of bandit rounds in the experiment
Ignore
Places where the workload shifts (even in static we run different templates in rounds, so it is simply 1 to number of rounds)0
For each round we run a set of queries from workload file starting from queries_start index to ...
queries_end index
Rounds where we invoke PDTool
optimal for static and dynamic shifting, last_run for dynamic random
Location of the workload file
Ignore
Ignore
Ignore
Memory limit in MB
Bandit parameter alpha
Bandit parameter lambda
Ignore
Ignore
Components you need to compare in this experiment (subset of MAB, TA_OPTIMAL, NO_INDEX, ...)
MAB version we are running (name of the mab file)

About

We propose a self-driving approach to online index selection that eschews the DBA and query optimiser, and instead learns the benefits of viable structures through strategic exploration and direct performance observation.

GNU General Public License v3.0

Languages

Language:Python 95.1%Language:TSQL 4.9%