Add variable importance for survival deep neural nets

Question

Add variable importance for survival deep neural nets

ilagith opened this issue 3 years ago · comments

Thank you so much for this beautiful package!

It would be great to compute variable importance for deep survival neural networks. For instance, 'TFDeepSurv' in python give the user this possibility. Could something similar be added also here?

Thank you a lot!

Raphael Sonabend · Answer 1 · Mon Apr 19 2021 17:33:50 GMT+0800 (China Standard Time)

Hiya!

Thank you so much for this beautiful package!

Thanks!

It would be great to compute variable importance for deep survival neural networks. For instance, 'TFDeepSurv' in python give the user this possibility. Could something similar be added also here?

I can't see it in that package, would you mind just linking the file or lines of code where it is mentioned?

ilagith · Answer 2 · Mon Apr 19 2021 17:51:34 GMT+0800 (China Standard Time)

I can't see it in that package, would you mind just linking the file or lines of code where it is mentioned?

Sure!

For Deepsurv, this is the python code link (from line 291 to 311):

https://github.com/devhliu/TFDeepSurv/blob/master/tfdeepsurv/L2DeepSurv.py

I have seen this being implemented in R in this under-development package, here from line 157:

https://github.com/CYGUBICKO/satpred/blob/master/R/deepsurv_satpred.R

Raphael Sonabend · Answer 3 · Mon Apr 19 2021 18:15:15 GMT+0800 (China Standard Time)

Right okay so just a generic permutation algorithm. To be honest I am unsure if this is the right place for it for a few reasons:

This package is primarily a model 'zoo', as in we have the model implementations and not much else and I am sure if I want to offer many more features than that.
Because the package features are limited, you would probably require other packages (e.g. see my tutorial with mlr3proba) for tuning anyway and without tuning, any interpretable methods are useless (feature importance has no meaning if the model is bad)
There is no good survival measure which means that even if we had this method and used cindex (like in the links above) it isn't really meaningful to say that one feature increases/decreases concordance by a certain amount (it isn't a proper scoring rule so making any causal conclusion is questionable)

Let me know your thoughts on this. I'm leaving this PR open and if it gets a lot of support (thumbs-up on your post will do) then I'll consider implementing by either:

Directly implementing here, but then there are many implementations in many packages of a fairly standard function; or
Making a PR to iml to support survival analysis, which is my preference given the tuning reasons above

ilagith · Answer 4 · Mon Apr 19 2021 19:13:52 GMT+0800 (China Standard Time)

This package is primarily a model 'zoo', as in we have the model implementations and not much else and I am sure if I want to offer many more features than that.

Because the package features are limited, you would probably require other packages (e.g. see my tutorial with mlr3proba) for tuning anyway and without tuning, any interpretable methods are useless (feature importance has no meaning if the model is bad)

There is no good survival measure which means that even if we had this method and used cindex (like in the links above) it isn't really meaningful to say that one feature increases/decreases concordance by a certain amount (it isn't a proper scoring rule so making any causal conclusion is questionable)

I agree on this, since I am also using mlr3proba for tuning. In a sense, once tuned it would be great to retrieve not only the performance measure, but also a feature ranking to make comparisons with a RandomSurvivalForest via mlr3proba for instance. Which would allow users to better understand what is going on behind the model, and make the great package mlr3proba even more complete.

Directly implementing here, but then there are many implementations in many packages of a fairly standard function; or

Making a PR to iml to support survival analysis, which is my preference given the tuning reasons above

But I also agree that this is a more a model 'zoo' package, and that maybe this issue is more related with model interpretability. Indeed, it would be amazing if iml would support survival analysis, or that https://github.com/fawda123/NeuralNetTools would support survival objects created via mlr3

Raphael Sonabend · Answer 5 · Mon Apr 19 2021 19:32:41 GMT+0800 (China Standard Time)

I agree on this, since I am also using mlr3proba for tuning. In a sense, once tuned it would be great to retrieve not only the performance measure, but also a feature ranking to make comparisons with a RandomSurvivalForest via mlr3proba for instance. Which would allow users to better understand what is going on behind the model, and make the great package mlr3proba even more complete.

I think it's also important to bear in mind that variable important from different models (and thereby different methods) isn't directly compatible. Which is a good argument for model-agnostic methods like in {iml}

But I also agree that this is a more a model 'zoo' package, and that maybe this issue is more related with model interpretability. Indeed, it would be amazing if iml would support survival analysis, or that https://github.com/fawda123/NeuralNetTools would support survival objects created via mlr3

I think I'd rather PR to {iml} as having to maintain mlr3 objects in another package would be a nightmare