loicland/parallel-cut-pursuit

Cut-Pursuit Algorithms, Parallelized Along Components

Generic C++ classes for implementing cut-pursuit algorithms.
Specialization to convex problems involving graph total variation, and nonconvex problems involving contour length, as explained in our articles (Landrieu and Obozinski, 2016; Raguet and Landrieu, 2018).
Parallel implementation with OpenMP.
MEX interfaces for GNU Octave or Matlab.
Extension modules for Python.

Table of Content

General problem statement
Generic C++ classes
Specialization for quadratic functional and graph total variation
Specialization for separable multidimensional loss and graph total variation
Specialization for separable distance and contour length
Directory tree
C++ documentation
GNU Octave or Matlab
Python
References
License

General problem statement

The cut-pursuit algorithms minimize functionals structured, over a weighted graph G = (V, E, w), as

F: x ∈ ℍ^V ↦ f(x) + ∑_{(u,v) ∈ E} w_(u,v) ψ(x_u, x_v) ,

where ℍ is some base space, and the functional ψ: ℍ² → ℝ penalizes dissimilarity between its arguments, in order to enforce solutions which are piecewise constant along the graph G.

The cut-pursuit approach is to seek partitions V of the set of vertices V, constituting the constant connected components of the solution, by successively solving the corresponding problem, structured over the reduced graph G = (V, E), that is

arg min_{ξ ∈ ℍ^V} F(x) , such that ∀ U ∈ V, ∀ u ∈ U, x_u = ξ_U ,

and then refining the partition.
A key requirement is thus the ability to solve the reduced problem, which often have the exact same structure as the original one, but with much less vertices |V| ≪ |V|. If the solution of the original problem has only few constant connected components in comparison to the number of vertices, the cut-pursuit strategy can speed-up minimization by several orders of magnitude.

Cut-pursuit algorithms come in two main flavors, namely “directionally differentiable” and “noncontinuous”.

In the directionally differentiable case, the base space ℍ is typically a vector space, and it is required that f is differentiable, or at least that its nondifferentiable part is separable along the graph and admits (potentially infinite) directional derivatives. This comprises notably many convex problems, where ψ(x_u, x_v) = ║x_u − x_v║, that is to say involving a graph total variation. The refinement of the partition is based on the search for a steep directional derivative, and the reduced problem is solved using convex or continuous optimization; optimality guarantees can be provided.
In the noncontinuous case, the dissimilarity penalization typically uses ψ(x_u, x_v) = 0 if x_u =x_v, 1 otherwise, resulting in a measure of the contour length of the constant connected components. The functional f is typically required to be separable along the graph, and to have computational properties favorable enough for solving reduced problems. The refinement of the partition relies on greedy heuristics.

Both flavors admit multidimensional extensions, that is to say ℍ is not required to be only set of scalars.

Generic classes

The class Cp_graph is a modification of the Graph class of Y. Boykov and V. Kolmogorov, for making use of their maximum flow algorithm.
The class Cp is the most generic, defining all steps of the cut-pursuit approach in virtual methods.
The class Cp_d1 specializes methods for directionally differentiable cases involving the graph total variation.
The class Cp_d0 specializes methods for noncontinuous cases involving the contour length penalization.

Specialization `Cp_d1_ql1b`: quadratic functional, ℓ₁ norm, bounds, and graph total variation

The base space is ℍ = ℝ, and the general form is

F: x ∈ ℝ^V ↦ 1/2 ║y^(q) − Ax║² + ∑_{v ∈ V} λ_v |y^(ℓ₁) − x_v| + ∑_{v ∈ V} ι_{[m_v, M_v]}(x_v)
+ ∑_{(u,v) ∈ E} w_(u,v) |x_u − x_v| ,

where y^(q) ∈ ℝⁿ, A: ℝⁿ → ℝ^V is a linear operator, y^(ℓ₁) ∈ ℝ^V and λ ∈ ℝ^V and w ∈ ℝ^E are regularization weights, m, M ∈ ℝ^V are parameters and ι_[a,b] is the convex indicator of [a, b] : x ↦ 0 if x ∈ [a, b], +∞ otherwise.

When y^(ℓ₁) is zero, the combination of ℓ₁ norm and total variation is sometimes coined fused LASSO.

When A is the identity, λ is zero and there are no box constraints, the problem boild down to the proximity operator of the graph total variation, also coined “graph total variation denoising” or “general fused LASSO signal approximation”.

Currently, A must be provided as a matrix. See the documentation for special cases.

The reduced problem is solved using the preconditioned forward-Douglas–Rachford splitting algorithm (see also the corresponding repository).

Two examples where A is a full ill-conditioned matrix are provided with GNU Octave or Matlab and Python interfaces: one with positivity and fused LASSO constraints on a task of brain source identification from electroencephalography, and another with boundary constraints on a task of image reconstruction from tomography.

	ground truth		raw retrieved activity		identified sources

Specialization `Cp_d1_lsx`: separable loss, simplex constraints, and graph total variation

The base space is ℍ = ℝ^D, where D can be seen as a set of labels, and the general form is

F: x ∈ ℝ^{V ⨯ D} ↦ f(y, x) + ∑_{v ∈ V} ι_{Δ_D}(x_v) + ∑_{(u,v) ∈ E} w^(d₁)_(u,v) ∑_{d ∈ D} λ_d |x_u,d − x_v,d| ,

where y ∈ ℝ^{V ⨯ D}, f is a loss functional (see below), w^(d₁) ∈ ℝ^E and λ ∈ ℝ^D are regularization weights, and ι_{Δ_D} is the convex indicator of the simplex Δ_D = {x ∈ ℝ^D | ∑_d x_d = 1 and ∀ d, x_d ≥ 0}: x ↦ 0 if x ∈ Δ_D, +∞ otherwise.

The following loss functionals are available, where w^(f) ∈ ℝ^V are weights on vertices.
Linear: f(y, x) = − ∑_{v ∈ V} w^(f)_v ∑_{d ∈ D} x_v,d y_v,d
Quadratic: f(y, x) = ∑_{v ∈ V} w^(f)_v ∑_{d ∈ D} (x_v,d − y_v,d)²
Smoothed Kullback–Leibler divergence (equivalent to cross-entropy):
f(y, x) = ∑_{v ∈ V} w^(f)_v KL(α u + (1 − α) y_v, α u + (1 − α) x_v),
where α ∈ ]0,1[, u ∈ Δ_D is the uniform discrete distribution over D, and KL: (p, q) ↦ ∑_{d ∈ D} p_d log(p_d/q_d).

The reduced problem is also solved using the preconditioned forward-Douglas–Rachford splitting algorithm (see also the corresponding repository).

An example with the smoothed Kullback–Leibler is provided with GNU Octave or Matlab and Python interfaces, on a task of spatial regularization of semantic classification of a 3D point cloud.

	ground truth		random forest classifier		regularized classification

Specialization `Cp_d0_dist`: separable distance and weighted contour length

The base space is ℍ = ℝ^D or Δ_D and the general form is

F: x ∈ ℝ^{V ⨯ D} ↦ f(y, x) + ∑_{(u,v) ∈ E} w^(d₀)_(u,v) ║x_u − x_v║₀ ,

where y ∈ ℍ^V, f is a loss functional akin to a distance (see below), and ║·║₀ is the ℓ₀ pseudo-norm x ↦ 0 if x = 0, 1 otherwise.

The following loss functionals are available, where w^(f) ∈ ℝ^V are weights on vertices and m^(f) ∈ ℝ^D are weights on coordinates.
Weighted quadratic: ℍ = ℝ^D and f(y, x) = ∑_{v ∈ V} w^(f)_v ∑_{d ∈ D} m^(f)_d (x_v,d − y_v,d)²
Weighted smoothed Kullback–Leibler divergence (equivalent to cross-entropy): ℍ = Δ^D and
f(y, x) = ∑_{v ∈ V} w^(f)_v KL_m^(f)(α u + (1 − α) y_v, α u + (1 − α) x_v),
where α ∈ ]0,1[, u ∈ Δ_D is the uniform discrete distribution over D, and
KL_m^(f): (p, q) ↦ ∑_{d ∈ D} m^(f)_d p_d log(p_d/q_d).

The reduced problem amounts to averaging, and the split step uses k-means++ algorithm.

When the loss is quadratic, the resulting problem is sometimes coined “minimal partition problem”.

An example with the smoothed Kullback–Leibler is provided with GNU Octave or Matlab and Python interfaces, on a task of spatial regularization of semantic classification of a 3D point cloud.

Directory tree

.   
├── data/         various data for illustration
├── include/      C++ headers, with some doc  
├── octave/       GNU Octave or Matlab code  
│   ├── doc/      some documentation  
│   └── mex/      MEX C interfaces
├── python/       Python code  
│   ├── cpython/  C Python interfaces  
│   └── wrappers/ python wrappers and documentation  
└── src/          C++ sources

C++ documentation

The C++ classes are documented within the corresponding headers in include/.

GNU Octave or Matlab

The MEX interfaces are documented within dedicated .m files in octave/doc/.
See the script compile_mex.m for typical compilation commands.

The script example_EEG.m exemplifies the use of Cp_d1_ql1b, on a task of brain source identification from electroencephalography.

The script example_tomography.m exemplifies the use of Cp_d1_ql1b, on a task of image reconstruction from tomography.

The scripts example_labeling_3D.m and example_labeling_3D_d0.m exemplify the use of, respectively, Cp_d1_lsx and Cp_d0_dist, on a task of spatial regularization of semantic classification of a 3D point cloud.

Python

The python interfaces are documented within corresponding .py files in python/wrappers/.
Requires numpy package.
See the script setup.py for compiling modules with distutils; currently, wrappers assume the libraries are built in a python/bin/ directory, which can be ensured using python setup.py build_ext --build-lib='bin'.

The script example_EEG.py exemplifies the use of Cp_d1_ql1b, on a task of brain source identification with electroencephalography.

The script example_labeling_3D.py exemplifies the use of Cp_d1_lsx, on a task of spatial regularization of semantic classification of a 3D point cloud.

References

L. Landrieu and G. Obozinski, Cut Pursuit: Fast Algorithms to Learn Piecewise Constant Functions on Weighted Graphs, 2017.

H. Raguet and L. Landrieu, Cut-pursuit Algorithm for Regularizing Nonsmooth Functionals with Graph Total Variation, 2018.

Y. Boykov and V. Kolmogorov, An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004.

License

This software is under the GPLv3 license.

loicland / parallel-cut-pursuit

Cut-Pursuit Algorithms, Parallelized Along Components

Table of Content

General problem statement

Generic classes

Specialization `Cp_d1_ql1b`: quadratic functional, ℓ₁ norm, bounds, and graph total variation

Specialization `Cp_d1_lsx`: separable loss, simplex constraints, and graph total variation

Specialization `Cp_d0_dist`: separable distance and weighted contour length

Directory tree

C++ documentation

GNU Octave or Matlab

Python

References

License

About

Languages

Cut-Pursuit Algorithms, Parallelized Along Components

Table of Content

General problem statement

Generic classes

Specialization Cp_d1_ql1b: quadratic functional, ℓ1 norm, bounds, and graph total variation

Specialization Cp_d1_lsx: separable loss, simplex constraints, and graph total variation

Specialization Cp_d0_dist: separable distance and weighted contour length

Directory tree

C++ documentation

GNU Octave or Matlab

Python

References

License

About

Languages

Specialization `Cp_d1_ql1b`: quadratic functional, ℓ₁ norm, bounds, and graph total variation

Specialization `Cp_d1_lsx`: separable loss, simplex constraints, and graph total variation

Specialization `Cp_d0_dist`: separable distance and weighted contour length