This repository contains the implementation for the paper "Crossing the Divide: Designing Layers of Explainability." It presents a novel approach to text classification that emphasizes explainability without significantly compromising performance. The paper has been accepted at "The 23rd International Conference on Artificial Intelligence and Soft Computing" (ICAISC 2024).
Model dumps and SUPPLEMENTAL MATERIAL are shared in GDrive folder.
In the era of deep learning, the opaque nature of sophisticated models often stands at odds with the growing demand for transparency and explainability in Artificial Intelligence. This paper introduces a novel approach to text classification that emphasizes explainability without significantly compromising performance. We propose a modular framework to distill and aggregate information in a manner conducive to human interpretation. At the core of our methodology is the premise that features extracted at the finest granularity are inherently explainable and reliable; compared with methods whose explanation is on word-level importance, this layered aggregation of low-level features allows us to trace a clearer decision trail of the model's decision-making process. Our results demonstrate this approach yields effective explanations with a marginal reduction in accuracy, presenting a compelling trade-off for applications where understandability is paramount.
dataset/
: Contains the datasets used in the study.dumps/
: Contains dumps of modelssrc/
: Source code for the proposed framework.src/explanations/
: Scripts for generating explanations.src/text_classification/base_models/evaluate_accuracy.py
: Scripts to reproduce the accuracy results reported in the paper for the base models.src/text_classification/deep_learning/evaluate_accuracy.py
: Scripts to reproduce the accuracy results reported in the paper for the deep learning models.
# Create a Virtual Environment
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install Requirements
pip install -r requirements.txt
-
Download the Dataset: Access and download the CMSB dataset from the official source academic database.
-
Place the Dataset in the
dataset/
Directory: Move the dataset into this repository'sdataset/
directory.
-
Download the Dataset: Access and download the IMDb dataset from Kaggle or the official IMDb website.
-
Place the Dataset in the
dataset/
Directory: Move the dataset into this repository'sdataset/
directory.
-
Download pretrained models from GDrive folder.
-
Place them in
dumps/
folder
To use the repository:
-
Generate Explanations: Navigate to
src/explanations/
and run the scripts to generate explanations.- in the script, set the model path in
local_explanation
/local_explanation_deep
- set the dataset to be used
- in the script, set the model path in
-
Evaluate Accuracy: Go to
src/text_classification/
and execute the scripts to reproduce the accuracy results- use
base_models/evaluate_accuracy.py
ordeep_learning/evaluate_accuracy.py
- set the model and
DATASET
variables in both scripts (deep model name must be set insrc/text_classification/deep_learning/config.yml
intesting
->model name
)
- use
You can use script src/text_classification/deep_learning/finetune.py
and src/text_classification/base_models/train_classifier.py
to finetune a LM from HuggingFace or train a XGBoost
classifier respectively.
Features can be extracted running src/features_analysis/aggregate_features.py
, and setting the correct dataset to be
used in the scripts.
This project is licensed under the MIT License — see the LICENSE file for details.