Want to drop the data and prediction columns in the explain dataframe output

Question

Want to drop the data and prediction columns in the explain dataframe output

busintelanalytics opened this issue 6 years ago · comments

Hi Thomas, great package.

Aloha from Hawaii!

I'm attempting to process 50K cases, 4 features per case. I'm really only after the content below, minus the data and prediction columns, since these columns explode the dataframe size. How do I run the explain function without outputting the data and prediction columns?

A data.frame encoding the explanations one row per explained observation. The columns are:
• model_type: The type of the model used for prediction.
• case: The case being explained (the rowname in cases).
• model_r2: The quality of the model used for the explanation
• model_intercept: The intercept of the model used for the explanation
• model_prediction: The prediction of the observation based on the model used for the explanation.
• feature: The feature used for the explanation
• feature_value: The value of the feature used
• feature_weight: The weight of the feature in the explanation
• feature_desc: A human readable description of the feature importance.
• data: Original data being explained
• prediction: The original prediction from the model**

Thomas Lin Pedersen · Answer 1 · Thu Nov 29 2018 17:52:29 GMT+0800 (China Standard Time)

Currently you can't... But bear in mind as well that explaining 50k observations is a crazy endeavour as the processing time will amount to days... The lime algorithm has never really been designed for such use

busintelanalytics · Answer 2 · Thu Nov 29 2018 18:00:46 GMT+0800 (China Standard Time)

I LOVE this package. Fantastic!! We’re using it to explain a keras model, and our use case, would in practice only be run for several hundred cases and at a time. Only holdup is the large data column that gets added to the dataframe, that we really don’t need, especially since it repeats for every feature of every case. Please consider this enhancement. Otherwise, fantastic stuff. Mahalo

…

On Wed, Nov 28, 2018 at 11:52 PM Thomas Lin Pedersen < ***@***.***> wrote: Currently you can't... But bear in mind as well that explaining 50k observations is a crazy endeavour as the processing time will amount to days... The lime algorithm has never really been designed for such use — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#135 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ap6PNm_lKNdcMP2FU8NccrQb90-g2lZqks5uz65fgaJpZM4Y5Sfu> .

Thomas Lin Pedersen · Answer 3 · Thu Nov 29 2018 18:02:23 GMT+0800 (China Standard Time)

if you are running it in batches you could simply drop the columns manually, after you get the explanation for each batch... It will be some time before I get to making further work on lime

busintelanalytics · Answer 4 · Thu Nov 29 2018 18:05:47 GMT+0800 (China Standard Time)

Yep. That’s what I’m doing. Anyways. Thanks!

…

On Thu, Nov 29, 2018 at 12:02 AM Thomas Lin Pedersen < ***@***.***> wrote: if you are running it in batches you could simply drop the columns manually, after you get the explanation for each batch... It will be some time before I get to making further work on lime — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#135 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ap6PNtq8ASXWO9S27qlYiKR2CMmKgdR5ks5uz7CwgaJpZM4Y5Sfu> .