Imputation of Missing Values in Time Series Benchmark vldb20
Repository structure
- Algorithms - missing blocks recovery algorithms: CDRec, STMVL, TRMF, TKCM, SPIRIT, TeNMF, GROUSE, SVDImpute, SoftImpute, SVT, ROSL, DynaMMo.
- Datasets - different datasets and time series from different sources.
- Testing Framework - a program to run automated suite of tests on the datasets with the algorithms mentioned above.
Prerequisites and dependencies (Linux)
- Ubuntu 16 and higher (or Ubuntu derivatives like Xubuntu)
- Sudo rights on the user
- Clone the repository
$ git clone https://github.com/eXascaleInfolab/bench-vldb19.git
- Mono Runtime and Compiler: follow step 1 from the installation guide in https://www.mono-project.com/download/stable/ for your Ubuntu version and afterwards do:
$ sudo apt-get install mono-devel
- All other prerequisites will be installed using a build script.
Build & tests
Restart the terminal window after all the dependencies are installed. Open it in the root folder of the repository.
- Build all the algorithms and Testing Framework using a script in the root folder (takes up to 5 minutes depending which prerequisites are already installed in the system):
$ sh install_linux.sh
- Run the benchmark:
$ cd TestingFramework/bin/Debug/
$ mono TestingFramework.exe
- Test suite will go over datasets one by one and executes all the scenarios for them with both precision test and runtime test. Plots folder in the root of the repository will be populated with the results.
- Remark: full test suite with the default setup will take a sizeable amount of time to run (up to 2 days depending on the hardware) and will produce up to 20GB of output files with all recovered data and plots unless stopped early.
Customize datasets
To add a dataset to the benchmark
- import the file to
TestingFramework/bin/Debug/data/{name}/{name}_normal.txt
-
- Requirements: >= 10 columns, >= 1'000 rows, column separator - empty space, row separator - newline
- add
{name}
to the list of datasets inTestingFramework/config.cfg
Customize algorithms
To exclude an algorithm from the benchmark
- open the file
TestingFramework/config.cfg
- add an entry
IgnoreAlgorithms =
and specify the list of algorithm codes to exclude them - the line starting with
#IgnoreAlgorithms =
provides codes for all the algorithms in the benchmark
Prerequisites and dependencies (macOS) -- Experimental
- It is possible to make the benchmark work on macOS with a few caveats:
-
- TRMF algorithm does not work with octave (on macOS), so it will be disabled.
-
- The installation can take longer than Linux. The longest processes are the installation of LLVM with brew and the compilation of mlpack.
- macOS 10.13 or higher, homebrew
- Sudo rights on the user
- Clone the repository
$ xcode-select --install
$ git clone https://github.com/eXascaleInfolab/bench-vldb19.git
- If you're running macOS 10.14 you also have to install C/C++ headers by typing the command below and going through the installation screen:
$ open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
- Mono Runtime and Compiler: Install the package provided by Mono in https://www.mono-project.com/download/stable/
- All other prerequisites will be installed using a build script.
Build & tests
- Restart the terminal window after all the dependencies are installed. Open it in the root folder of the repository.
- Build all the algorithms and Testing Framework using a script in the root folder (takes up to 10-12 minutes depending which prerequisites are already installed in the system):
$ sh install_mac.sh
- Run the benchmark:
$ cd TestingFramework/bin/Debug/
$ mono TestingFramework.exe
Customize datasets and algorithms
The process is identical to Linux.