boschresearch / causalAssembly

Home Page:https://boschresearch.github.io/causalAssembly/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

causalAssembly

License: AGPL v3

This repo provides details regarding a causal discovery benchmark data tool based on complex production data.

Authors

Maintainer: Martin Roth (CC/MFD2)

Table of contents

How to install

The package can be installed as follows

pip install git+https://github.com/boschresearch/causalAssembly.git

How to use

This is how causalAssembly's functionality may be used. Be sure to read the documentation for more in-depth details and usages.

In case you want to train a distributional random forests yourself, you need an R installation as well as the corresponding drf R package. Sampling has first been proposed in [2].

Note: For Windows users the python package rpy2 might cause issues. Please consult their issue tracker on GitHub.

In order to fit DRFs and sample data, consider the following example:

import pandas as pd

from causalAssembly.models_dag import ProductionLineGraph
from causalAssembly.drf_fitting import fit_drf

seed = 2023
n_select = 500

assembly_line_data = ProductionLineGraph.get_data()

# take subsample for demonstration purposes
assembly_line_data = assembly_line_data.sample(
    n_select, random_state=seed, replace=False
)

# load in ground truth
assembly_line = ProductionLineGraph.get_ground_truth()

# fit drf and sample for entire line
assembly_line.drf = fit_drf(assembly_line, data=assembly_line_data)
assembly_line_sample = assembly_line.sample_from_drf(size=n_select)

# fit drf and sample for station3
assembly_line.Station3.drf = fit_drf(assembly_line.Station3, data=assembly_line_data)
station3_sample = assembly_line.Station3.sample_from_drf(size=n_select)

The ProductionLineGraph class can also be used to generate completely random DAGs that follow an assembly line logic. Consider the following example:

from causalAssembly.models_dag import ProductionLineGraph

example_line = ProductionLineGraph()

example_line.new_cell(name='Station1')
example_line.Station1.add_random_module()
example_line.Station1.add_random_module()

example_line.new_cell(name='Station2')
example_line.Station2.add_random_module(n_nodes=5)

example_line.new_cell(name='Station3', is_eol= True)
example_line.Station3.add_random_module()
example_line.Station3.add_random_module()

example_line.connect_cells(forward_probs= [.1])

example_line.show()

causalAssembly also allows to create functional causal model (FCM) and sample after specifying noise distributions. For creating and sampling from handcrafted FCMs, a simple example would be:

from causalAssembly.models_fcm import HandCrafted_FCM
from sympy import symbols, Eq
from sympy.stats import Uniform

x,y,z = symbols('x,y,z')

eq_x = Eq(x, Uniform('x', left=-1, right=1))
eq_y = Eq(y, 2*x**2 + 3)
eq_z = Eq(z, 9*y*x)

eq_list = [eq_x, eq_y, eq_z]

example_fcm = HandCrafted_FCM(name='example_fcm', seed= 2023)
example_fcm.input_fcm(eq_list)

print(example_fcm.graph.edges())

example_df = example_fcm.draw(size= 10, add_noise= True, snr= 2/3)
example_df.head()

References

[1] Ćevid, D., Michel, L., Näf, J., Bühlmann, P., & Meinshausen, N. (2022). Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression. Journal of Machine Learning Research, 23(333), 1-79.

[2] Gamella, J.L, Taeb, A., Heinze-Deml, C., & Bühlmann, P. (2022). Characterization and greedy learning of Gaussian structural causal models under unknown noise interventions. arXiv preprint arXiv:2211.14897, 2022.

How to test

In general we use pytest and the test suite can be executed locally via

python -m pytest

How to contribute?

Please feel free to contact one of the authors in case you wish to contribute.

Third-Party Licenses

Runtime dependencies

Name License Type
numpy BSD-3-Clause License Dependency
scipy BSD-3-Clause License Dependency
pandas BSD 3-Clause License Dependency
networkx BSD-3-Clause License Dependency
matplotlib Other Dependency
sympy BSD-3-Clause License Dependency
pydantic MIT License Dependency
distinctipy MIT License Dependency
rpy2 GNU General Public License v2.0 Dependency

Development dependency

Name License Type
mike BSD-3-Clause License Dependency
mkdocs BSD-2-Clause License Dependency
mkdocs-material MIT License Dependency
mkdocstrings[python] ISC License Dependency
ruff MIT License Dependency
pytest MIT License Dependency
pip-tools BSD 3-Clause License Dependency

About

https://boschresearch.github.io/causalAssembly/

License:GNU Affero General Public License v3.0


Languages

Language:Python 99.3%Language:Makefile 0.6%Language:HTML 0.1%