juangamella / stars

Python implementation of the StARS algorithm: "Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models" by Han Liu, Kathryn Roeder, Larry Wasserman, 2010

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Example

This repository contains a Python implementation of the StARS algorithm, from the paper "Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models" by Liu, Kathryn Roeder, Larry Wasserman (https://arxiv.org/pdf/1006.3316.pdf).

Feedback is very welcome!

Requirements

Minimum requirements (see setup.py):

  • Standard library
  • numpy

To run with the Graphical Lasso from Scikit-learn:

  • Standard library
  • numpy
  • sklearn>=0.20

Using the Graphical Lasso estimator from sklearn

The function stars.glasso.fit selects the regularization parameter via StARS, and then runs Scikit-learn's Graphical Lasso over the data.

Parameters:

  • X (np.array): Array containing n observations of p variables. Columns are the observations of a single variable
  • beta (float, optional): Maximum allowed instability between subsample estimates. Defaults to 0.05, the value recommended in the paper.
  • N (int, optional): Number of subsamples, must be a divisor of n. Defaults to the value recommended in the paper, i.e. int(n / np.floor(10 * np.sqrt(n))).
  • start (float, optional): Starting lambda for the search procedure. Defaults to 1.
  • step (float, optional): Initial step at which to increase lambda. Defaults to 1.
  • tol (float, optional): Tolerance of the search procedure, i.e. the search procedure stops when the instability at a given lambda is below tol of beta. Defaults to 1e-5.
  • precision_tol (float, optional): Cutoff value at which nonzero elements of the precision matrix returned by the graphical lasso are considered edges in the graph. Defaults to 1e-4.
  • max_iter (int, optional): Maximum number of iterations for which the search procedure is run. It corresponds to the maximum number of times the estimator is run. Defaults to 20.
  • glasso_params (dict, optional): Dictionary used to pass additional parameters to sklearn.covariance.GraphicalLasso. You can find a list of available parameters here. Defaults to {}.
  • debug (bool, optional): If debugging messages should be printed during execution. Defaults to False.

Returns:

  • estimate (np.array): The adjacency matrix of the resulting graph estimate.

Example (running stars.glasso.fit with default parameters and debug messages):

import numpy as np
import stars, stars.glasso

# Generate data from a neighbourhood graph (page 10 of the paper)
true_precision = stars.neighbourhood_graph(100)
true_covariance = np.linalg.inv(true_precision)
X = np.random.multivariate_normal(np.zeros(100), true_covariance, size=400)

# Run StARS + Graphical lasso
estimate = stars.glasso.fit(X, debug=True)

Example (running stars.glasso.fit with additional parameters for sklearn.covariance.GraphicalLasso):

import numpy as np
import stars, stars.glasso

# Generate data from a neighbourhood graph (page 10 of the paper)
true_precision = stars.neighbourhood_graph(100)
true_covariance = np.linalg.inv(true_precision)
X = np.random.multivariate_normal(np.zeros(100), true_covariance, size=400)

# Set additional parameters for the Graphical Lasso estimator
args = {'max_iter': 100, 'mode': 'lars'}

# Run StARS + Graphical lasso
estimate = stars.glasso.fit(X, glasso_params = args)

Using an estimator of your choice

Estimator function

StARS can be used to select the regularization parameter for other graphical model estimators. To do this, your estimator must be wrapped in a function which takes two arguments:

  • subsamples (np.array): An array containing the subsampled data, of dimension Nxbxp, where N is the number of subsamples, b=n/N and p is the number of variables.
  • lambda (float): The regularization value at which to run the estimator.

It must return a Nxpxp np.array containing the adjacency matrix (0s or 1s) of the estimate for each subsample. There is a complete example below.

Parameters (for stars.fit):

  • X (np.array): Array containing n observations of p variables. Columns are the observations of a single variable
  • estimator (function): Wrapper function for your estimator, as described above.
  • beta (float, optional): Maximum allowed instability between subsample estimates. Defaults to 0.05, the value recommended in the paper.
  • N (int, optional): Number of subsamples, must be divisor of n. Defaults to the value recommended in the paper, i.e. int(n / np.floor(10 * np.sqrt(n))).
  • start (float, optional): Starting lambda in the search procedure. Defaults to 1.
  • step (float, optional): Initial step at which to increase lambda. Defaults to 1.
  • tol (float, optional): Tolerance of the search procedure, i.e. the search procedure stops when the instability at a given lambda is below tol of beta. Defaults to 1e-5.
  • max_iter (int, optional): Maximum number of iterations for which the search procedure is run. It corresponds to the maximum number of times the estimator is run. Defaults to 20.
  • debug (bool, optional): If debugging messages should be printed during execution. Defaults to False.

Returns:

  • estimate (p x p np.array): The adjacency matrix of the resulting graph estimate.

An example:

import numpy as np
import stars

# Define a dummy estimator (returns the same estimate for all subsamples)
def estimator(subsamples, lmbda):
    p = subsamples.shape[2]
    A = np.triu(np.random.uniform(size=(p,p)), k=1)
    A += A.T
    A = A > 0.5
    return np.array([A] * len(subsamples))

# Generate data from a neighbourhood graph (page 10 of the paper)
true_precision = stars.neighbourhood_graph(100)
true_covariance = np.linalg.inv(true_precision)
X = np.random.multivariate_normal(np.zeros(100), true_covariance, size=400)

# Run StARS + Graphical lasso
stars.fit(X, estimator)

About

Python implementation of the StARS algorithm: "Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models" by Han Liu, Kathryn Roeder, Larry Wasserman, 2010

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 92.3%Language:Makefile 7.7%