sktime / pcit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predictive Conditional Independence Testing (PCIT)

with applications in graphical model structure learning

Currently under review! See issues

Description

This package implements a multivariate conditional independence test and an algorithm for learning directed graphs from data based on the PCIT

arXiv preprint

Developers

If you like to contribute, read our contribution guide.

Code Example

For details, see the Examples or the Manual

There are 3 main functions:

  • MetaEstimator: Estimator class used for independence testing
  • PCIT: Multivariate Conditional Independence Test
  • find_neighbours: Undirected graph skeleton learning algorithm

For the following, X, Y and Z can be univariate or multivariate

from sklearn.datasets import load_boston
data = load_boston()['data']
X = data[:,1:2]
Y = data[:,2:4]
Z = data[:,4:10]
Testing if X is independent of Y on a 0.01 confidence level
PCIT(X, Y, confidence = 0.01)

The direction of the prediction is X -> Y, and as such the p-values correspond to the hypothesis that adding X does not improve the prediction of Y (one for each dimension in Y). If the parameter 'symmetric' is set to True (default), both directions are tested.

Testing if X is independent Y, conditional on Z
PCIT(X, Y, z = Z)
Testing if X is independent of Y, conditional on Z, using a custom MetaEstimator, multiplexing over a manually chosen set of estimators:
from sklearn.linear_model import RidgeCV, LassoCV,
                    SGDClassifier, LogisticRegression

regressors = [RidgeCV(), LassoCV()]
classifiers = [SGDClassifier(), LogisticRegression()]
custom_estim = MetaEstimator(method = 'multiplexing',
                estimators = (regressors, classifiers))

PCIT(X, Y, z = Z, estimator = custom_estim)
Learning the undirected graph with the undirected skeleton of X:
X = load_boston()['data']
find_neighbours(X)

Motivation

Conditional as well as multivariate independence testing are difficult problems lacking a straightforward, scalable and easy-to-use solution. This project connects the classical independence testing task to the supervised learning workflow. This has the following advantages:

  • The link to the highly researched supervised learning workflow allows classical independence testing to grow its power as a side effect of the improvement in supervised learning methodology
  • The sophisticated knowledge of hyperparameter-tuning in supervised prediction removes any need for hyperparameter tuning and manual choices prevalent in current methodology
  • As a wrapper for the sklearn package, the PCIT is easy to use and adjust

Installation

Can be installed through pip

pip install pcit

The dependencies are:

Tests

Three tests can be run:

Test_PCIT_Power: Tests the power for increasing sample sizes on a difficult v-structured problem. Matlab code for same problem to compare with the "Kernel Conditional Independence Test" can be found here

Test_PCIT_Consistency: Here the consistency under perturbations in the data is assessed.

Test_Structure: Here the power and false-discovery rate control of the graphical model structure learning algorithm are assessed

License

MIT License

About

License:MIT License


Languages

Language:Jupyter Notebook 57.7%Language:Python 41.5%Language:MATLAB 0.7%