antoninschrab / mmdfuse-paper

Reproducibility code for MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting, by Biggs, Schrab, and Gretton: https://arxiv.org/abs/2306.08777

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reproducibility code for MMD-FUSE

This GitHub repository contains the code for the reproducible experiments presented in our paper MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting.

We provide the code to run the experiments to generate the figures and tables from our paper, these can be found in figures.

To use MMD-FUSE in practice, we recommend using our mmdfuse package, more details available on the mmdfuse repository.

Requirements

  • python 3.9

Only the jax and jaxlib packages are required to run MMD-FUSE (see mmdfuse), several other packages are required to run other tests we compare against (see env_mmdfuse.yml and env_autogluon.yml)

Installation

In a chosen directory, clone the repository and change to its directory by executing

git clone git@github.com:antoninschrab/mmdfuse-paper.git
cd mmdfuse-paper

We then recommend creating and activating a virtual environment using conda by running

conda env create -f env_mmdfuse.yml
conda env create -f env_autogluon.yml
conda activate mmdfuse-env
# conda activate autogluon-env
# can be deactivated by running:
# conda deactivate

Reproducing the experiments of the paper

The results of the six experiments can be reproduced by running the code in the notebooks: experiments_mixture.ipynb, experiments_perturbations.ipynb, experiment_perturbations_vary_kernel.ipynb, experiments_galaxy.ipynb, experiments_cifar.ipynb, and experiments_runtimes.ipynb.

The results are saved as .npy files in the directory results. The figures of the paper can be obtained from these by running the code in the figures.ipynb notebook.

All the experiments are comprised of 'embarrassingly parallel for loops', significant speed up can be obtained by using parallel computing libraries such as joblib or dask.

Datasets

Samplers

How to use MMD-FUSE in practice?

The MMD-FUSE test is implemented as the function mmdfuse in mmdfuse.py in Jax. It requires only the jax and jaxlib packages.

To use our tests in practice, we recommend using our mmdfuse package which is available on the mmdfuse repository. It can be installed by running

pip install git+https://github.com/antoninschrab/mmdfuse.git

Installation instructions and example code are available on the mmdfuse repository.

We also provide some code showing how to use MMD-FUSE in the demo_speed.ipynb notebook which also contains speed comparisons between running the code on CPU or on GPU:

Speed in s Jax (GPU) Jax (CPU)
MMD-FUSE 0.0054 2.95

References

Interpretable Distribution Features with Maximum Testing Power. Wittawat Jitkrittum, Zoltán Szabó, Kacper Chwialkowski, Arthur Gretn. (paper, code)

Learning Deep Kernels for Non-Parametric Two-Sample Tests. Feng Liu, Wenkai Xu, Jie Lu, Guangquan Zhang, Arthur Gretton, Danica J. Sutherland. (paper, code)

MMD Aggregated Two-Sample Test. Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Grett. (paper, code)

AutoML Two-Sample Test. Jonas M. Kübler, Vincent Stimper, Simon Buchholz, Krikamol Muandet, Bernhard Schölkopf. (paper, code)

Compress Then Test: Powerful Kernel Testing in Near-linear Time. Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey. (paper, code)

Contact

If you have any issues running the code, please do not hesitate to contact Antonin Schrab.

Affiliations

Centre for Artificial Intelligence, Department of Computer Science, University College London

Gatsby Computational Neuroscience Unit, University College London

Inria London

Bibtex

@article{biggs2023mmdfuse,
  author        = {Biggs, Felix and Schrab, Antonin and Gretton, Arthur},
  title         = {{MMD-FUSE}: {L}earning and Combining Kernels for Two-Sample Testing Without Data Splitting},
  year          = {2023},
  journal       = {Advances in Neural Information Processing Systems},
  volume        = {36}
}

License

MIT License (see LICENSE.md).

Related tests

  • mmdagg: MMD Aggregated MMDAgg test
  • ksdagg: KSD Aggregated KSDAgg test
  • agginc: Efficient MMDAggInc HSICAggInc KSDAggInc tests
  • dpkernel: Differentially private dpMMD dpHSIC tests
  • dckernel: Robust to Data Corruption dcMMD dcHSIC tests

About

Reproducibility code for MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting, by Biggs, Schrab, and Gretton: https://arxiv.org/abs/2306.08777

License:MIT License


Languages

Language:Jupyter Notebook 93.8%Language:Python 6.2%