interpretable-machine-learning r research

📋 This README.md is mainly a description about how to train and evaluate a specific model, to see the real tuning process of the 5 fold cross-validation PIE method, please visit the readme file in Argon folder.

Partially Interpretable Estimators

This repository is the official implementation of Partially Interpretable Estimators.

Requirements

Our PIE model is constructed based on R language. However, some baselines presented in the paper is done by Python. Navigate to baseline folder if needed.

To install requirements for PIE model:

Rscript requirements.R

Training

Please load the PIE algorithm functions before training

#To save all PIE algorithm functions in functions.RData file which will be called in Train.R
Rscript load_functions.R

Regression

To train a simple Regression model(s) in the paper, run this command:

#Train.py will generate a train.RData file of the model
Rscript Train_Regression.R dataset.RData lambda1 lambda2 stepsize iteration eta nrounds fold

For example, if the dataset is saved in CASP.RData file, lambda1 = 0.01, lambda2 = 1, stepsize=0.1, iteration = 500, eta = 0.05, nrounds=200 and train the model with dataset in fold 1,then the call function would be the following:

Rscript Train_Regression.R CASP.RData 0.01 1 0.1 500 0.05 200 1

Classification

To train a simple Classification model(s) in the paper, run this command:

#Train.py will generate a train.RData file of the model
Rscript Train_Classification.R dataset.RData lambda1 lambda2 gamma iteration nrounds stepsize tree_nrounds fold

For example, if the dataset is saved in CASP.RData file, lambda1 = 0.01, lambda2 = 1, stepsize=1, gamma=0.05, iteration=500, nrounds=200, tree_nrounds=200 and train the model with dataset in fold 1,then the call function would be the following:

Rscript Train_Classification.R CASP.RData 0.01 1 0.05 500 200 1 200 1

📋 The description and function above describes how to train a simple model. However, we applied 5-fold cross-validation to elimiate bias. Please reference to the last section and Argon folder to see detailed training code and command.

Evaluation

Regression

To evaluate my regression model, run:

Rscript Evaluate_Regression.R model.RData

Evaluate.R load the train.RData file generated by Train.R and predict with the model in Train.R. And the Evaluate.R will make prediction on both Validation Test and Real Test Dataset. So to call after Train.R

Rscript Evaluate_Regression.R train.RData

Classification

To evaluate my classification model, run:

Rscript Evaluate_Classification.R model.RData

Evaluate.R load the train.RData file generated by Train.R and predict with the model in Train.R. And the Evaluate.R will make prediction on both Validation Test and Real Test Dataset. So to call after Train.R

Rscript Evaluate_Classification.R train.RData

📋 The description and function above describes how to evaluate a simple model.

Pre-trained Models and Data Splits

Since we applied 5-fold cross-validation on each dataset, the final result reported in the paper is the average of performance for each fold and the standard deviation of the five result is provided as error bars as well. Thus, for each fold, we select model with the best performance on the validation data and then predict with the test data. In this case, you may find out that the result computed from given model is not the same as the result provided by the paper. The result reported in this github is just the result from one single model not the average of 5 models.

The way of 5-fold data splitting is also shown in the RData file.

Due to the storage limitation of github, we are unable to provide all models in this github. Other models can be provided upon request.

You can download pre-trained RData files of models from the folder Pre-trained Models.

Results

Result Associated with Given Pre-trained Model

Our model achieves the following performance with the given example Pre-trained Models:

Note: The following result is different from the result shown in the paper because our paper reports the average result of 5-fold(5 models) and the following is just the result for one sigle model.

Dataset	PIE - RPE	PIE - pi
CBM	.000	1.00
energyp	.458	.267
parkinsons	.002	.900
winequality	.532	.566
blog	.530	.315
crime	.043	.973
glucose	.102	.982

5-fold Cross-Validation PIE Model Training

To fully replicate the training process demonstrated in the paper, please reference to the readme file in Argon folder.

Baseline

Please Reference to the Baseline folder.

Contributing

📋 Pick a licence and describe how to contribute to your code repository.

About

interpretable-machine-learning r research

Languages

Language:R 50.2%Language:Python 49.4%Language:Shell 0.4%