Andrew0613 / PSAVE

Interpreting Feature Contributions with Prioritized Shapley Values

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

PSAVE

PSAVE (Prioritized ShApley ValuE on feasible coalitions) is a game-theoretic method for interpreting feature importance of DNNs. Traditional Shapley value suffers problems rising from unsatisfactory nature of payoff function when applying to DNNs. PSAVE tries to solve the problem by applying feasible coalitions and feature priority.

Datasets

We use MNIST dataset and Boston House Price (BHP) dataset in our experiments.

You can download MNIST dataset from http://yann.lecun.com/exdb/mnist/, or use corresponding package provided by pytorch. The BHP dataset can be viewed in our experiment code.

Code

We further referenced the experiment code of SAGE (Shapley Additive Global importancE). And you can find SAGE's experiment code pieces in our code. SAGE's code can be downloaded from http://github.com/iancovert/sage/. In order to run our experiment, you're supposed to install the sage-importance package with pip first.

pip install sage-importance

Usage

In fact, PSAVE is a model-agnostic method, so theoretically you can apply it to any kind of models. However, we only test its performance on DNNs. The generally way of using PSAVE is as follows.

import psave
import find_fc
import w_with_fc
from d_calculator import DCalculatorPreMem
from v_calculator import VCalculator

# Load data
x, y = ...
feature_num = ...

# Load model
model = ...

# Preparations
met_path = ...
batch_size = ...

calc_v = VCalculator(x, y, model, 'mse', batch_size)
F = find_fc.find_fc(feature_num, calc_v)
calc_div = DCalculatorPreMem(x, y, model, F, 'mse', batch_size)
w = w_with_fc.get_w_with_fc(feature_num, F)

# Evalutation
res = psave.psave_whole_pre_mem(feature_num, w, F, calc_div, tp=True)
np.savetxt(met_path, res)

For loss function, only 'mse' (mean square importance) and 'cross entropy' are supported.

Unfortunately, we haven't carefully organize our code, so the robustness and expandability of our code may be very bad. In order to run our experiment code, you may need to modify our code first. For now, our code is better for reference use.

File Structure

core

We present two samples for using our PSAVE method on MNIST and BHP.

On MNIST, we use a heuristic method for constructing feasible coalitions on datasets with image features. And we apply 2-D Gaussian Density Function as the priority.

On BHP, we use a more general greedy algorithm to construct feasible coalitions. And we use the number of feasible coalitions one feature participates in as the priority.

details

We present the details of code on BHP as an example, which is more general.

  • boston.py: The evaluation sample code on BHP.
  • v_calculator.py: Provide callable class VCalculator for computing payoff function defined by SAGE.
  • d_calculator.py: Provide callable class DCalculatorPreMem for computing coalition dividends with memorized searching skills.
  • psave.py: Provide function psave_whole_pre_mem for computing PSAVE of all features.
  • find_fc.py: Provide function find_fc_v for constructing feasible coalitions in a greedy way.
  • w_with_fc.py: Provide function get_w_with_fc for compute feature priority.
  • imputers.py: You can reference SAGE's Imputers. This is for masking particular features.
  • model_train.py: Train DNN model.
  • net.py: Definition of DNN model.
  • utils: Provide functions for set operations.

feasible_condition_BHP

We validate our greedy algorithm for constructing feasible coalitions here. The core code can be viewed in supera.py.

feature_selection_BHP

We do feature selection and feature importance experiments here on BHP. Also we do experiments for validating PSAVE's convergence nature. The core code can be viewed in boston.py, boston_sage.py, convergence.py.

feature_selection_MNIST

We do feature selection experiments here on MNIST. The core code can be viewed in feature selection.ipynb. In this fold, we mainly use code from http://github.com/iancovert/sage/, so you can reference it for details.

If you try to run the code, remember to evaluate the model with PSAVE first and save the result to a local file. Then, you're supposed to modify corresponding paths in feature selection.ipynb.

About

Interpreting Feature Contributions with Prioritized Shapley Values


Languages

Language:Jupyter Notebook 59.3%Language:Python 40.7%