Predictive Conditional Independence Testing (PCIT)

with applications in graphical model structure learning

Currently under review! See issues

Description

This package implements a multivariate conditional independence test and an algorithm for learning directed graphs from data based on the PCIT

arXiv preprint

Developers

Samuel Burkart: designated point of contact
Franz Kiraly

If you like to contribute, read our contribution guide.

Code Example

For details, see the Examples or the Manual

There are 3 main functions:

MetaEstimator: Estimator class used for independence testing
PCIT: Multivariate Conditional Independence Test
find_neighbours: Undirected graph skeleton learning algorithm

For the following, X, Y and Z can be univariate or multivariate

from sklearn.datasets import load_boston
data = load_boston()['data']
X = data[:,1:2]
Y = data[:,2:4]
Z = data[:,4:10]

Testing if X is independent of Y on a 0.01 confidence level

PCIT(X, Y, confidence = 0.01)

The direction of the prediction is X -> Y, and as such the p-values correspond to the hypothesis that adding X does not improve the prediction of Y (one for each dimension in Y). If the parameter 'symmetric' is set to True (default), both directions are tested.

Testing if X is independent Y, conditional on Z

PCIT(X, Y, z = Z)

Testing if X is independent of Y, conditional on Z, using a custom MetaEstimator, multiplexing over a manually chosen set of estimators:

from sklearn.linear_model import RidgeCV, LassoCV,
                    SGDClassifier, LogisticRegression

regressors = [RidgeCV(), LassoCV()]
classifiers = [SGDClassifier(), LogisticRegression()]
custom_estim = MetaEstimator(method = 'multiplexing',
                estimators = (regressors, classifiers))

PCIT(X, Y, z = Z, estimator = custom_estim)

Learning the undirected graph with the undirected skeleton of X:

X = load_boston()['data']
find_neighbours(X)

Motivation

Conditional as well as multivariate independence testing are difficult problems lacking a straightforward, scalable and easy-to-use solution. This project connects the classical independence testing task to the supervised learning workflow. This has the following advantages:

The link to the highly researched supervised learning workflow allows classical independence testing to grow its power as a side effect of the improvement in supervised learning methodology
The sophisticated knowledge of hyperparameter-tuning in supervised prediction removes any need for hyperparameter tuning and manual choices prevalent in current methodology
As a wrapper for the sklearn package, the PCIT is easy to use and adjust

Installation

Can be installed through pip

pip install pcit

The dependencies are:

Tests

Three tests can be run:

Test_PCIT_Power: Tests the power for increasing sample sizes on a difficult v-structured problem. Matlab code for same problem to compare with the "Kernel Conditional Independence Test" can be found here

Test_PCIT_Consistency: Here the consistency under perturbations in the data is assessed.

Test_Structure: Here the power and false-discovery rate control of the graphical model structure learning algorithm are assessed

License

MIT License

sktime / pcit