pmangg / Sinkhorn-Barycenters

Sinkhorn Barycenters via Frank-Wolfe algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Free-Support Sinkhorn Barycenters

This repository complements the paper Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm by Giulia Luise, Saverio Salzo, Massimiliano Pontil and Carlo Ciliberto published at Neural Information Processing Systems (NeurIPS) 2019, by providing an implementation of the proposed algorithm to compute the Barycenter of multiple probability measures with respect to the Sinkhorn Divergence.

Slides can be found here.

If you are interested in using the proposed algorithm in your projects please refer to the instructions here.

Below we provide the code and instructions to reproduce most of the experiments in the paper. We recommend running all experiments on GPU.

List of Experiments

Dependencies

The core dependencies are:

For some experiments we also have the following additional dependencies:

A Classic: Barycenter of Nested Ellipes

We compute the barycenter of 30 randomly generated nested ellipses on a 50 × 50 pixels image, similarly to (Cuturi and Doucet 2014). We interpret each image as a probability distribution in 2D. The cost matrix is given by the squared Euclidean distances between pixels. The fiture reports 8 samples of the input ellipses (all examples can be found in the folder data/ellipses and the barycenter obtained with the proposed algorithm in the middle. It shows qualitatively that our approach captures key geometric properties of the input measures.

Run:

$ python experiments/ellipses.py

Output in folder out/ellipses

Continuous Measures: Barycenter of Gaussian Distributions (Coming Soon)

We compute the barycenter of 5 Gaussian distributions with mean and covariance matrix randomly generated. We apply to empirical measures obtained by sampling n = 500 points from each one. Since the (Wasserstein) barycenter of Gaussian distributions can be estimated accurately (Cuturi and Doucet 2014) (see (Agueh and Carlier 2011)), in the figure we report both the output of the proposed algorithm (as a scatter plot) and the true Wasserstein barycenter (as level sets of its density). We observe that our estimator recovers both the mean and covariance of the target barycenter.

Run:

$ python experiments/gaussians.py

Output in folder out/gaussians

Instructions for additional experiments and parameters can be found directly in the file experiments/gaussians.py.

Distribution Matching

Similarly to (Claici et al. 2018), we test the proposed algorithm in the special case where we are computing the “barycenter” of a single measure (rather than multiple ones). While the solution of this problem is the input distribution itself, we can interpret the intermediate iterates the proposed algorithm as compressed version of the original measure. In this sense the iteration number k represents the level of compression since the corresponding barycenter estimate is supported on at most k points. The figure (Right) reports iteration k = 5000 for the proposed algorithm applied to the 140 × 140 image in (Left) interpreted as a probability measure in 2D. We note that the number of points in the support is ∼3900: the most relevant support points are selected multiple times to accumulate the right amount of mass on each of them (darker color = higher weight). This shows that the proposed approach tends to greedily search for the most relevant support points, prioritizing those with higher weight.

Run:

$ python experiments/matching.py

Output in folder out/matching

The code can be run with any image by passing the path to the desired image as additional argument.

Sinkhorn k-Means Clustering

We test the proposed algorithm on a k-means clustering experiment. We consider a subset of 500 random images from the MNIST dataset. Each image is suitably normalized to be interpreted as a probability distribution on the grid of 28 × 28 pixels with values scaled between 0 and 1. We initialize 20 centroids according to the k-means++ strategy. The figure deipcts the corresponding 20 centroids obtained throughout this process. We see that the structure of the digits is successfully detected, recovering also minor details (e.g. note the difference between the 2 centroids).

Run:

$ python experiments/kmeans.py

Output in folder out/kmeans

Sinkhorn Propagation (Coming Soon)

                               

We consider the problem of Sinkhorn propagation similar to the Wasserstein propagation in (Solomon et al. 2014). The goal is to predict the distribution of missing measurements for weather stations in the state of Texas, US (data from National Climatic Weather Data) by propagating measurements from neighboring stations in the network. The problem can be formulated as minimizing the functional

over the set with: $\mathcal{V}_0\subset\mathcal{V}$ the subset of stations with missing measurements, $G = (\mathcal{V},\mathcal{E})$ the whole graph of the stations network, $\omega_{uv}$ a weight inversely proportional to the geographical distance between two vertices/stations $u,v\in\mathcal{V}$. The variable $\rho_v\in\mathcal{M}_1^+(\mathbb{R}^2)$ denotes the distribution of measurements at station $v$ of daily temperature and atmospheric pressure over one year. This is a generalization of the barycenter problem. From the total $|\mathcal{V}|$=115, we randomly select 10%, 20% or 30% to be available stations, and use the proposed algorithm to propagate their measurements to the remaining “missing” ones. We compare our approach (FW) with the Dirichlet (DR) baseline in (Solomon et al. 2014) in terms of the error $d(C_T,\hat C)$ between the covariance matrix $C_T$ of the ground truth distribution and that of the predicted one. Here $d(A,B) = \|\log(A^{-1/2} B A^{-1/2})\|$ is the geodesic distance on the cone of positive definite matrices. In the figures above we qualitatively report the improvement $\Delta = d(C_T,C_{DR}) - d(C_T,C_{FW})$ of our method on individual stations: a higher color intensity corresponds to a wider gap in our favor between prediction errors, from light green $(\Delta\sim 0)$ to red $(\Delta\sim 2)$. Our approach tends to propagate the distributions to missing locations with higher accuracy.

Run:

$ python experiments/propagation.py

Output in folder out/propagation

References

About

Sinkhorn Barycenters via Frank-Wolfe algorithm


Languages

Language:Python 98.8%Language:Shell 1.2%