Paper: Explanations Based on Item Response Theory (eXirt): A Model-Specific Method to Explain Tree-Ensemble Model in Trust Perspective.

Autors:

Absrtact

Solutions based on tree-ensemble models represent a considerable alternative to real-world prediction problems, but these models are considered black box, thus hindering their applicability in problems of sensitive contexts (such as: health and safety). Explainable Artificial Intelligence (XAI) aims to develop techniques that generate explanations of black box models, since these models are normally not self-explanatory. Methods such as Ciu, Dalex, Eli5, Lofo, Shap and Skater emerged with the proposal to explain black box models through global rankings of feature relevance, which based on different methodologies, generate global explanations that indicate how the model's inputs explain its predictions. This research aims to present an innovative XAI method, called eXirt, capable of carrying out the process of explaining tree-ensemble models, based on Item Response Theory (IRT). In this context, 41 datasets, 4 tree-ensemble algorithms (Light Gradient Boosting, CatBoost, Random Forest, and Gradient Boosting), and 7 XAI methods (including eXirt) were used to generate explanations. In the first set of analyses, the 164 ranks of global feature relevance generated by eXirt were compared with 984 ranks of the other XAI methods present in the literature, being verified that the new method generated different explanations from other existing methods. In a second analysis, exclusive local and global explanations generated by eXirt were presented that help in understanding the model trust, since in this explanation it is possible to observe particularities of the model regarding difficulty (if the model had difficulty predicting the test dataset), discrimination (if the model understands the test dataset as discriminative) and guesswork (if the model got the test dataset right by chance). Thus, it was verified that eXirt is able to generate global explanations of tree-ensemble models and also local and global explanations of models through IRT, showing how this consolidated theory can be used in machine learning in order to obtain explainable and reliable models.

Note: This repository was created to contain all additional information from the article "Explanations Based on Item Response Theory (eXirt): A Model-Specific Method to Explain Tree-Ensemble Model in Trust Perspective", for reproducibility purposes.

Description for execution:

All data regarding the reproducibility of this work can be found in this repository.

Supplementary Material Based on Illustrations: supplementary material with more illustrations referring to the paper;
workspace_cluster_0.zip: all datasets, performance graphs, models and analyzes coming from cluster 0;
workspace_cluster_1.zip: all datasets, performance graphs, models and analyzes coming from cluster 1;
workspace_cluster_2.zip: all datasets, performance graphs, models and analyzes coming from cluster 2;
workspace_cluster_3.zip: all datasets, performance graphs, models and analyzes coming from cluster 3;
eXirt_pipeline_v0_3_2_m1_to_m4.ipynb: all the source code used to execute the experiments presented in this research. It should be noted that this notebook is properly commented, documented and separated into sections for better understanding in case of an execution;
pipeline_xai.py (local version): all the source code used to execute the experiments presented in this research. Due to some instability generated by the latest updates to the colab platform, we are making the .py code available to run locally on a Windows machine.
eXirt_analisys_of_datasets.ipynb: all analysis of item parameter values for the specifics datasets;
eXirt_simple_execution.ipynb: simple execution of eXirt;
eXirt_simple_execution_import.ipynb: simple execution of eXirt using import;
https://pypi.org/project/eXirt/: python eXirt distribution repository;
df_dataset_properties.csv: dataset with all 15 properties analyzed in the Multiple Correspondence Analysis - MCA;
df_dataset_properties_norm.csv: dataset normalized with all 15 properties analyzed in the Multiple Correspondence Analysis - MCA;
df_dataset_properties_binarized.csv: dataset binarized with all 15 properties analyzed in the Multiple Correspondence Analysis - MCA;

To run the ".ipynb", it is suggested to use Google Colab, for a better and faster execution of the tool.

Cite this work:

@article{ribeiro4572173explanations,
  title={Explanations Based on Item Response Theory (Exirt): A Model-Specific Method to Explain Tree-Ensemble Model in Trust Perspective},
  author={Ribeiro Filho, Jos{\'e} de Sousa and Cardoso, Lucas Felipe Ferraro and Silva, Ra{\'\i}ssa Lorena Silva da and Carneiro, Nikolas Jorge Santiago and Santos, Vitor Cirilo Araujo and Alves, Ronnie Cley de Oliveira},
  journal={Available at SSRN 4572173}
}

josesousaribeiro / eXirt-XAI-Pipeline

Paper: Explanations Based on Item Response Theory (eXirt): A Model-Specific Method to Explain Tree-Ensemble Model in Trust Perspective.

Absrtact

About

Languages