This repository contains the code to reproduce the results comparing ABCD to A-ICP in the paper Active Invariant Causal Prediction: Experiment Selection through Stability, by Juan L Gamella and Christina Heinze-Deml.
The repository is forked from agrawalraj/active_learning, which contains the original implementation of ABCD. It contains minor changes to the code to get it to run and retrieve results for the experiments comparing ABCD to A-ICP. Changes to the original code are marked with a comment: #A-ICP paper: *
.
You will need at least
Python 3.6
R 3.6
The original installation procedure didn't work for us, and some python dependencies were missing from the venv. This is what we did to get the code to run.
In an R terminal, run
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install()
BiocManager::install("Rgraphviz", "RBGL")
install.packages('pcalg', repos='http://cran.us.r-project.org')
install.packages('gRbase', repos='http://cran.us.r-project.org')
cd new/
bash make_venv.sh
source venv/bin/activate
pip install pyaml tqdm xarray causaldag scipy
The dataset used to run the experiments is generated through the code in the A-ICP implementation (see Reproducing experiments in that repo's README), and then copied to the new/data/
directory. The dataset is a directory structure (here dataset/
). Unfortunately, running the code renders the dataset unusable for other runs, so we have to copy it (I like to keep dataset/
as the "master copy"). For the experiments (a total of 12), we copy it 12 times, plus one to test everything works:
cd data/
cp -r dataset dataset_0
cp -r dataset dataset_1
cp -r dataset dataset_2
cp -r dataset dataset_3
cp -r dataset dataset_4
cp -r dataset dataset_5
cp -r dataset dataset_6
cp -r dataset dataset_7
cp -r dataset dataset_8
cp -r dataset dataset_9
cp -r dataset dataset_10
cp -r dataset dataset_11
cp -r dataset dataset_test
I would then run a small experiment once to see if everything works:
python run_experiments.py -n 10 -b 2 -k 1 --boot 20 -s 7 --folder dataset_test --strategy entropy --starting-samples 100
If it doesn't crash after 1 minute, kill it and run the rest. To run the ABCD algorithm four times (with different inital random seeds) at 50, 100 and 1000 observational sample sizes:
cd new/
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_0 --strategy entropy --starting-samples 50
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_1 --strategy entropy --starting-samples 50
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_2 --strategy entropy --starting-samples 50
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_3 --strategy entropy --starting-samples 50
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_4 --strategy entropy --starting-samples 100
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_5 --strategy entropy --starting-samples 100
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_6 --strategy entropy --starting-samples 100
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_7 --strategy entropy --starting-samples 100
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_8 --strategy entropy --starting-samples 1000
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_9 --strategy entropy --starting-samples 1000
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_10 --strategy entropy --starting-samples 1000
python run_experiments.py -n 500 -b 50 -k 1 --boot 100 -s 7 --folder dataset_11 --strategy entropy --starting-samples 1000
Parallelization
The code automatically runs on as many cores as are made available to it, minus one.
Results
The results are pickled into the new
directory. The filenames contain a timestamp and the parameters used, eg.
pp_1589260274_n_samples:500_n_batches:50_max_interventions:1_strategy:entropy_intervention_strength:5.0_starting_samples:100_target:0_intervention_type:gauss_target_allowed:True.pickle
They can be plotted with the plots_abcd.ipynb
notebook in the A-ICP repository.