AnchorTabularExplainer without categorical features

Question

AnchorTabularExplainer without categorical features

asstergi opened this issue 6 years ago · comments

Firstly, the paper is great and I'm really looking forward to using the package.

I tried to use it on my own data where the AnchorTabularExplainer() object does not have any categorical_names (i.e. categorical features). I see that the code when calling the explain_instance() method goes to https://github.com/marcotcr/anchor/blob/master/anchor/anchor_tabular.py#L215 and since there are no categorical features, the mapping dict remains empty and so the method is not working.

Am I missing something? Or, is there something I can do to overcome this?

Marco Tulio Correia Ribeiro · Answer 1 · Tue Feb 06 2018 06:56:46 GMT+0800 (China Standard Time)

Hello,
I'm glad you found the paper interesting.
You are not missing something, this is a bug in the code.
The anchor method needs categorical data, so I used to have a discretizer in the __init__ method for when the model uses numerical features. To be clear: the black box model can use continuous data, but the resulting anchor will be in discretized bins, such as "If Salary > 5000, predict X".

I must have removed that at some point and forgotten to put it back in.
I'll try to add it back soon, thanks for letting me know.

Marco Tulio Correia Ribeiro · Answer 2 · Wed Feb 07 2018 05:23:49 GMT+0800 (China Standard Time)

In the meantime, you can discretize your data first, similar to what I do here

asstergi · Answer 3 · Thu Feb 22 2018 15:55:10 GMT+0800 (China Standard Time)

Hi @marcotcr,

I discretized the data and got anchor working, thank you!

However, I'm seeing some inconsistencies in the reported coverage and precision when I try to use the anchor explanation on the original dataset (i.e. before the discretization).

Not sure if you can help just by looking at this code, but here's what I'm doing:
`
print('Anchor: %s' % (' AND '.join(exp.names())))

fit_anchor = np.where(np.all(X_trans_test_disc[:, exp.features()] == X_trans_test_disc[idx][exp.features()], axis=1))[0]
print('Anchor test coverage: %.4f' % (fit_anchor.shape[0] / float(X_trans_test_disc.shape[0])))
print('Anchor test precision: %.4f' % (np.mean(predict_fn(X_trans_test_disc[fit_anchor]) == predict_fn(X_trans_test_disc[idx].reshape(1, -1)))))

anch = y_trans[(X_trans['this_race_last_year_result'] > 1.50) & 
             (X_trans['grid'] > -9.50) & 
             (X_trans['grid'] <= -5.50)]
print ('Anchor test coverage (orig): %.4f' % (1.0*anch.shape[0]/y_trans.shape[0]))
print ('Anchor test precision (orig): %.4f' % (1.0*anch.sum()/anch.shape[0]))`

And here's the output:

Anchor: -9.50 < grid <= -5.50 AND this_race_last_year_result > 1.50

Anchor test coverage: 0.0316
Anchor test precision: 1.0000

Anchor test coverage (orig): 0.0486
Anchor test precision (orig): 0.8527

I would expect the figures to match. Any idea on this?

Marco Tulio Correia Ribeiro · Answer 4 · Tue Feb 27 2018 08:20:57 GMT+0800 (China Standard Time)

If the validation and test distributions are similar, the numbers should match. I would have to see it in more detail to understand if your discretization is doing something or if there's a bug in the code. I can take a look if you can share a notebook.

The newest version I uploaded has discretizing built in, you may want to give it a try.
It may be buggy since I didn't test it throughly, it may be safer to train a classifier on discretized data like you're doing.

ajayaadhikari · Answer 5 · Thu Mar 29 2018 22:26:49 GMT+0800 (China Standard Time)

Hello @marcotcr,
I am also trying to use numerical features.
You suggested to discretize the data before giving it to AnchorTabularExplainer right?
How will the AnchorTabularExplainer know to inverse discretize the data to get predictions on the pertubed samples?

Marco Tulio Correia Ribeiro · Answer 6 · Fri Mar 30 2018 02:46:50 GMT+0800 (China Standard Time)

If you discretize the data before you give it to AnchorTabularExplainer, you would have to learn the model on discretized features. If you want the black box model to use numerical features, you have to use the newest version with built in discretizing.

eindzl · Answer 7 · Fri Jul 13 2018 15:57:54 GMT+0800 (China Standard Time)

Hi there.
I found the same problem and used the following workaround, which works fine for me.
In the file anchor_tabular.py add an else clause to the __init__ method of class AnchorTabularExplainer

 class AnchorTabularExplainer(object):

    ... original code ...

    def __init__(self, class_names, feature_names, data=None,

        ... original code ...

        if categorical_names:
            # TODO: Check if this n_values is correct!!
            cat_names = sorted(categorical_names.keys())
            n_values = [len(categorical_names[i]) for i in cat_names]
            self.encoder = sklearn.preprocessing.OneHotEncoder(
                categorical_features=cat_names,
                n_values=n_values)
            self.encoder.fit(data)
            self.categorical_features = self.encoder.categorical_features
        else:  ## Allow for datasets without categorical names
            categorical_names = {}

        ... original code ...

This will prevent the update to fail and allow for discretization of your numerical variables within the explainer.

Amr Ebaid · Answer 8 · Wed Feb 06 2019 07:12:44 GMT+0800 (China Standard Time)

The anchor method needs categorical data, so I used to have a discretizer in the __init__ method for when the model uses numerical features. To be clear: the black box model can use continuous data, but the resulting anchor will be in discretized bins, such as "If Salary > 5000, predict X".

I must have removed that at some point and forgotten to put it back in.
I'll try to add it back soon, thanks for letting me know.

~~Has this been fixed in the code? Or we still have to do the workaround?~~
Never mind, I figured it out. I had to fit the classifier too, not only the explainer.

Thanks,
Amr

Kshitij Yeotikar · Answer 9 · Wed Jul 03 2019 19:53:37 GMT+0800 (China Standard Time)

@eindzl Thanks, I also had the same problem and now it works correctly after your update .

Sean Saito · Answer 10 · Mon Oct 07 2019 14:43:06 GMT+0800 (China Standard Time)

Hi there.
I found the same problem and used the following workaround, which works fine for me.
In the file anchor_tabular.py add an else clause to the __init__ method of class AnchorTabularExplainer

 class AnchorTabularExplainer(object):

    ... original code ...

    def __init__(self, class_names, feature_names, data=None,

        ... original code ...

        if categorical_names:
            # TODO: Check if this n_values is correct!!
            cat_names = sorted(categorical_names.keys())
            n_values = [len(categorical_names[i]) for i in cat_names]
            self.encoder = sklearn.preprocessing.OneHotEncoder(
                categorical_features=cat_names,
                n_values=n_values)
            self.encoder.fit(data)
            self.categorical_features = self.encoder.categorical_features
        else:  ## Allow for datasets without categorical names
            categorical_names = {}

        ... original code ...

This will prevent the update to fail and allow for discretization of your numerical variables within the explainer.

Will this workaround be implemented at some point?