sean-chester / SkyBench

Collection of algorithms in C++ for main-memory skyline computation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SkyBench

Version 1.1

© 2015-2016 Darius Šidlauskas, Sean Chester, and Kenneth S. Bøgh


Table of Contents


Introduction

The SkyBench software suite contains software for efficient main-memory computation of skylines. The state-of-the-art sequential (i.e., single-threaded) and multi-core (i.e., multi-threaded) algorithms are included.

The skyline operator [1] identifies so-called pareto-optimal points in a multi-dimensional dataset. In two dimensions, the problem is often presented as finding the silhouette of Manhattan:
if one has knows the position of the corner points of every building, what parts of which buildings are visible from across the river? The two-dimensional case is trivial to solve and not the focus of SkyBench.

In higher dimensions, the problem is formalised with the concept of dominance: a point p is dominated by another point q if q has better or equal values for every attribute and the points are distinct. All points that are not dominated are part of the skyline. For example, if the points correspond to hotels, then any hotel that is more expensive, farther from anything of interest, and lower-rated than another choice would not be in the skyline. In the table below, Marge's Hotel is dominated by Happy Hostel, because it is more expensive, farther from Central Station, and lower rated, so it is not in the skyline. On the other hand, The Grand has the best rating and Happy Hostel has the best price. Lovely Lodge does not have the best value for any one attribute, but neither The Grand nor Happy Hostel outperform it on every attribute, so it too is in the skyline and represents a good balance of the attributes.

Name Price per Night Rating Distance to Central Station In skyline?
The Grand $325 ⋆⋆⋆⋆⋆ 1.2km
Marge's Motel $55 ⋆⋆ 3.6km
Happy Hostel $25 ⋆⋆⋆ 0.4km
Lovely Lodge $100 ⋆⋆⋆⋆ 8.2km

As the number of dimensions/attributes increases, so too does the size of and difficulty in producing the skyline. Parallel algorithms, such as those implemented here, quickly become necessary.

SkyBench is released in conjunction with our recent ICDE paper [2]. All of the code and scripts necessary to repeat experiments from that paper are available in this software suite. To the best of our knowledge, this is also the first publicly released C++ skyline software, which will hopefully be a useful resource for the academic and industry research communities.


Algorithms

The following algorithms have been implemented in SkyBench:

  • Hybrid [2]: Located in src/hybrid. It is the state-of-the-art multi-core algorithm, based on two-level quad-tree partitioning of the data and memoisation of point-to-point relationships.

  • Q-Flow [2]: Located in src/qflow. It is a simplification of Hybrid to demonstrate control flow.

  • PSkyline [3]: Located in src/pskyline. It was the previous state-of-the-art multi-core algorithm, based on a divide-and-conquer paradigm.

  • BSkyTree [4]: Located in src/bskytree. It is the state-of-the-art sequential algorithm, based on a quad-tree partitioning of the data and memoisation of point-to-point relationships.

All four algorithms are implementations of the common interface defined in common/skyline_i.h and use common dominance tests from
common/common.h and common/dt_avx.h (the latter when vectorisation is enabled).


Datasets

For reproducibility of the experiments in [2], we include three datasets. The WEATHER dataset was originally obtained from The University of East Anglia Climatic Research Unit and preprocessed for skyline computation. We also include two classic skyline datasets, exactly as used in [2]: NBA and HOUSE.

The synthetic workloads can be generated with the standard benchmark skyline data generator [1] hosted on pgfoundry.


Requirements

SkyBench depends on the following applications:

  • A C++ compiler that supports C++11 and OpenMP (e.g., the newest GNU compiler)

  • The GNU make program

  • AVX or AVX2 if vectorised dominance tests are to be used


Usage

To run, the code needs to be compiled with the given number of dimensions.^ For example, to compute the skyline of the 8-dimensional NBA data set located in workloads/nba-U-8-17264.csv, do:

make all DIMS=8

./bin/SkyBench -f workloads/nba-U-8-17264.csv

By default, it will compute the skyline with all algorithms. Running ./bin/SkyBench without parameters will provide more details about the supported options.

You can make use of the provided shell script (/script/runExp.sh) that does all of the above automatically. For details, execute:

./script/runExp.sh

To reproduce the experiment with real datasets (Table II in [2]), do (assuming a 16-core machine):

./scripts/realTest.sh 16 T "bskytree pbskytree pskyline qflow hybrid"

^For performance reasons, skyline implementations that we obtained from other authors compile their code for a specific number of dimensions. For a fair comparison, we adopted the same approach.


License

This software is subject to the terms of The MIT License, which has been included in this repository.


Contact

This software suite will be expanded soon with new algorithms; so, you are encouraged to ensure that this is still the latest version. Please do not hesitate to contact the authors if you have comments, questions, or bugs to report.

SkyBench on GitHub


References

S. Börzsönyi, D. Kossmann, and K. Stocker. (2001) "The Skyline Operator." In Proceedings of the 17th International Conference on Data Engineering (ICDE 2001), 421--432. http://infolab.usc.edu/csci599/Fall2007/papers/e-1.pdf

S. Chester, D. Šidlauskas, I Assent, and K. S. Bøgh. (2015) "Scalable parallelization of skyline computation for multi-core processors." In Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE 2015), 1083--1094. http://cs.au.dk/~schester/publications/chester_icde2015_mcsky.pdf

H. Im, J. Park, and S. Park. (2011) "Parallel skyline computation on multicore architectures." Information Systems 36(4): 808--823. http://dx.doi.org/10.1016/j.is.2010.10.005

J. Lee and S. Hwang. (2014) "Scalable skyline computation using a balanced pivot selection technique." Information Systems 39: 1--21. http://dx.doi.org/10.1016/j.is.2013.05.005


About

Collection of algorithms in C++ for main-memory skyline computation

License:MIT License


Languages

Language:C++ 78.4%Language:Shell 9.9%Language:C 9.9%Language:Makefile 1.9%