calvinmccarter/maddness-old

This page describes how to reproduce the experimental results reported in our paper.

Note that this page (and the clean, easy-to-use version of our code) are still under construction and we refer the reader to https://smarturl.it/Maddness for the latest version.

Install Dependencies

To run the experiments, you will first need to obtain the following tools / libraries and datasets.

C++ Code

Xcode, to run the C++ timing benchmarks using Xcode (this is the "official" version that is much better tested and actually works at the moment)
Bazel, Google's open-source build system (support coming soon...)

Python Code:

Joblib - for caching function output
Scikit-learn - for k-means
Kmc2 - for k-means seeding
Pandas - for storing results and reading in data
Seaborn - for plotting, if you want to reproduce our figures

Datasets

Download Caltech 101 and the UCR Time Series Archive.
Edit python/datasets/paths.py to point to where you're storing them.

The activations and weights from the CIFAR-10 and CIFAR-100 datasets are already included under python/assets.

View Existing Results

All results are in python/results/amm. The timing results are in the subdirectory timing.

Reproduce Timing / Throughput results

The C++ code is driven by Catch run via Xcode. You can just open Bolt.xcodeproj (Maddness was built as a fork of Bolt) and press run with the appropriate arguments. For different experiments, the arguments are:

f() speed for various methods: [scan][amm]~[old]
g() speed for various methods: [encode][amm]~[old]
h() speed for various (not reported in the paper, but interesting): [lut][amm]\~[old]
Overall AMM speed: [matmul][amm]~[old].

We highly recommend running this when the machine is otherwise idle. Also note that we haven't yet automated having the C++ code dump results into the appropriate files, so you'll have to manually paste the output into the corresponding file in python/results/amm/timing.

Coming soon: Working Bazel build for all the code and wrapper shell scripts to run and store the output of each experiment.

Reproduce Accuracy Results

From the python directory, run python -m python.amm_main. This will run all the methods we showed in the body of the paper (and some others that run quickly) on CIFAR-10, CIFAR-100, Caltech 101 using both the Sobel and Gaussian filters, and the datasets from the UCR Time Series Archive.

Reproduce Plots

From the python directory, run python -m python.amm_figs2. You can uncomment different lines in main to only produce subsets of the plots.

Other notes

Our method is called Mithral in the source code, not Maddness.

calvinmccarter / maddness-old