Advanced Systems Lab - Fall 2017

This repository contains middleware source code, aggregated data, final plots, bash scripts for running the system on Azure, Python scripts to extract aggregated data from raw data and Mathlab code for Queueing models.

author: Jovan Nikolic

ETH, Zurich, Switzerland

Final report is available here.

Project description is available here.

Organization of data, scripts and plots

Size of raw data obtained from experiments is around 160GB, hence it is not uploaded to the repository. However, aggregated data from middleware logs over 1 second window, is uploaded and can be found in aggregated_data folder. Data from dstat tool, ping results and logs from memtier client are not uploaded as well. All data is available on request.

Organization of data

In aggregated_data folder, data is organized per experiments in the following way:

baseline_1midd contains data from baseline experiment with 1 middleware
baseline_2midd contain data from baseline experiment with 2 middlewares
baseline_2midd_2vms contains data from repetition of experiment with 2 middlewares under write-only load with 2 client machines
experiment_write-only contains data from write-only experiments from Section 4
experiment_gets contain data from gets and multi-gets experiments from Section 5
experiment_2k contains data from experiments conducted for 2k analysis
queueing theory contains data extracted from previous experiments for input values in queueing models
baseline_nomidd_1server contains raw data from baseline without middleware with 1 server experiment
baseline_nomidd_2servers contains raw data from baseline without middleware with 2 servers experiment

In baseline_nomidd_1server the following naming convention applies:

json_output_file_cpt_X_repY_ZZZZ_vmN.json represents output file of memtier instance, where X is number of virtual clients per thread at memtier, Y is repetition number = {1, 2, 3}, ZZZZ is S0-G1 for read-only load and S1-G0 for write-only load, N is virtual machine id = {1, 2, 3}.
dstat_machineN_cptX_repY_S.csv represents output file of dstat tool, where machine = {"memtier", "memcached"}, N is id of machine instance, X is number of virtual clients per thread at memtier, Y is repetition number = {1, 2, 3} and S is "r" for read-only and "w" for write-only. Ping results are identified in similar way.

In baseline_nomidd_2servers the following naming convention applies:

for dstat logs and ping results, convention is the same as above
json_output_file_instA_cptX_repY_ZZZZ_vmN.json represents output file of memtier clients, where A is instance id = {1, 2}, X is number of virtual clients per thread at memtier, Y is repetition number = {1, 2, 3}, ZZZZ is S0-G1 for read-only load and S1-G0 for write-only load, N is virtual machine id = {1, 2, 3}.

In queueing_theory folder, files noq_1midd.txt and noq_2midd.txt contain service times that are input for network of queues models with 1 and 2 middlewares, and mm_values.ods is Libre Office (~ MC Excel) table with sign tables for M/M/1 and M/M/m models.

In all other folders, data is divided in timers and counters folders. Folder timers contains aggregated time-related data like response time, server-service time, net-thread processing time and so on. The naming convention is quite clear: timer_aggregated_data_clientThreads_X_workerThreads_M_ZZZZ.csv where X is number of virtual clients per thread at memtier, M is number of workers per middleware and ZZZZ is S0-G1 for read-only load and S1-G0 for write-only load. Folder counters contains throughput aggregates and files are named similarly: throughput_X_workerThreads_M_ZZZZ.csv, where X is number of virtual clients per thread at memtier, M is number of workers per middleware and ZZZZ is S0-G1 for read-only load and S1-G0 for write-only load. In cases with multiple middleware instances, ID of middleware is added to the filename with tag mw. In experiment_gets folder, filename includes (human-readable) information about number of keys and if sharding is enabled.

Organization of plots

Plots are organized in separate folders based on experiments, and folders are named the same way as in aggregated_data folder. Inside each of these folders, there is folder called timers which includes all final tables and plots from which middleware-related-figures in the report are plotted (including throughput!). Outside of that folder, final tables and plots regarding memtier are located. Filenames are human-readable.

All plots are generated using gnuplot, and all scripts are next to the files they use for plotting.

Global figures and diagrams like flow chart, architecture of the system and illustartions of network of queues models are located directly in plots folder.

Organization of scripts

Bash shell scripts used for running experiments on Azure are located next to src folder: baseline_1midd.sh, baseline_2mid.sh, baseline_2midd_repeating.sh, baseline_nomidd_1server.sh, baseline_nomidd_2servers.sh, experiments_2k.sh, experiment_gets.sh, experiment_write-only.sh. The scripts do not require any input parameters, all parameters are specified inside. For rerun, path to known_hosts file must be updated, and scripts assume that virtual machines are already running.

Python scripts for analyzing logs from middleware are located next to src folder as well. They require raw data which is currently not uploaded to repository. Naming convention is simple: scripts starting with process_middleware_timers extracts and aggregates time-related data from middlewares, like response time, server-service time and so on, scripts starting with process_middleware_counters extracts throughput from middleware logs, scripts starting with process_memtier_data extracts data from memtier logs.

Scripts starting with process_aggregated_data use data from aggregated_data folder to create final tables for plotting, and to aggregate data between repetitions.

Report folder

Report folder contains *.tex file of the report.

Middleware code

Middleware code is located in src/ch/ethz folder and inside: asltest folder contains all main classes that build middleware, in instrumentation are utilities used for logging and in utils folder are located various utility classes.

Experiments Journal:

All experiments have these parameters in common:

CT is number of threads (set by --thread or -t), VC is number of virtual clients per thread (set by --client or -c), CPM = CT * VC is number of virtual clients per memtier instance, and final number of clients is NumClients = NumInstances * CPM = NumInstances * CT * VC
choose at least 6 points in range [1, 32] for VC value
--data-size=1024
--key-maximum=10000
--expiry-range=9999-10000
--random-data
repeated at least 3 times for statistical significance

should have stable phase of at least 60 secs

*name*	*type*	*for what*
foraslvms1	A2 2vcpus, 3.5 GB	memtier
foraslvms2	A2 2vcpus, 3.5 GB	memtier
foraslvms3	A2 2vcpus, 3.5 GB	memtier
foraslvms4	A4 8vcpus, 14 GB	middleware
foraslvms5	A4 8vcpus, 14 GB	middleware
foraslvms6	A1 1vcpus, 1.75 GB	memcached
foraslvms7	A1 1vcpus, 1.75 GB	memcached
foraslvms8	A1 1vcpus, 1.75 GB	memcached

Experiments outline:

1.1. Baseline without Middleware, 1 server can be found in data\baseline_nomidd_1server

*name*	*value*
memtier VMs	3
memtier instances per VM	1
memtier threads per instance	2
memtier virtual clients per thread	(1 5 9 14 19 23 28 32)
memtier actual virtual clients per thread	(1 5 9 14 19 23 28 32 37 47 52)
memcached VMs	1
memcached instances per VM	1
load	read-only and write-only

1.2. Baseline withour Middleware, 2 servers can be found in data\baseline_nomidd_2servers

*name*	*value*
memtier VMs	1
memtier instances per VM	2
memtier threads per instance	1
memtier virtual clients per thread	(1 5 9 14 19 23 28 32)
memtier actual virtual clients per thread	(1 5 9 14 19 23 28 32 37 47 52)
memcached VMs	2
memcached instances per VM	1
load	read-only and write-only

2.1. Baseline with 1 Middleware can be found in data\baseline_1midd

*name*	*value*
memtier VMs	1
memtier instances per VM	1
memtier threads per instance	2
memtier virtual clients per thread	(1 5 9 14 19 23 28 32)
memtier actual virtual clients per thread	(1 5 8 14 19 23 28 32 42 52 64)
middleware VMs	1
middleware instances per VM	1
middleware threads per instance	(8, 16, 32, 64)
memcached VMs	1
memcached instances per VM	1
load	read-only and write-only

2.2. Baseline with 2 Middlewares can be found in data\baseline_2midd

*name*	*value*
memtier VMs	1
memtier instances per VM	2
memtier threads per instance	1
memtier virtual clients per thread	(1 5 9 14 19 23 28 32)
memtier actual virtual clients per thread	(1 5 8 14 19 23 28 32 42 52 64)
middleware VMs	2
middleware instances per VM	1
middleware threads per instance	(8, 16, 32, 64)
memcached VMs	1
memcached instances per VM	1
load	read-only and write-only

3. Throughput for Writes can be found in data\experiment_write-only

*name*	*value*
memtier VMs	3
memtier instances per VM	2
memtier threads per instance	1
memtier virtual clients per thread	(1 5 9 14 19 23 28 32)
memtier actual virtual clients per thread	(1 5 8 14 19 23 28 32 42 52 64)
middleware VMs	2
middleware instances per VM	1
middleware threads per instance	(8, 16, 32, 64)
memcached VMs	3
memcached instances per VM	1
load	write-only

4.1. GETs and multi-GETs, Sharded Case can be found in data\experiment_gets

*name*	*value*
memtier VMs	3
memtier instances per VM	2
memtier threads per instance	1
memtier virtual clients per thread	2
middleware VMs	2
middleware instances per VM	1
middleware threads per instance	64
memcached VMs	3
memcached instances per VM	1
load	read-only with (1 3 6 9) keys, sharded *enabled*

4.2. GETs and multi-GETs, Non-Sharded Case can be found in data\experiment_gets

*name*	*value*
memtier VMs	3
memtier instances per VM	2
memtier threads per instance	1
memtier virtual clients per thread	2
middleware VMs	2
middleware instances per VM	1
middleware threads per instance	64 (or any lower number of threads that gives max throughput)
memcached VMs	3
memcached instances per VM	1
load	read-only with (1 3 6 9) keys, sharded *disabled*

5. 2k Analysis can be found in data\2k_analysis

*name*	*value*
memtier VMs	3
memtier instances per VM	2
memtier threads per instance	1
memtier virtual clients per thread	32
middleware VMs	(1 2)
middleware instances per VM	1
middleware threads per instance	(8 32)
memcached VMs	(2 3)
memcached instances per VM	1
load	write-only, read-only and 50-50-read-write, all single-keyed GET requests

Color Palette:

#D43849
#63CB9D
#0B547E
#AF71FC
#C4B205

Tried these as well: #63cb9d, #1a4b4d, #99c042, #2f1b51, #8f3040, #000000, #2a7d58, #5b1e35, #F24738, #FF9600

jovan-ioanis / asl-project-2017