som-shahlab / survlearners

Metalearners for Survival Data

Home Page:https://som-shahlab.github.io/survlearners

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tracking of readable implementations of estimators

nignatiadis opened this issue · comments

Many of these are in the file comparison_estimators.R, which should be part of the package.

  • T learners
    • Cox-Lasso (Nikos)
    • Random survival forest (Erik)
  • S learners
    • Cox-Lasso (Nikos)
    • Random survival forest (Erik)
  • F learners
    • Lasso (Nikos)
    • Random forest (Erik)
  • X learners
    • Cox-Lasso + Lasso (Nikos)
    • Random survival forest + Lasso (Erik)
    • Random survival forest + Random forest (Erik)
  • R learners
    • Cox-Lasso + Lasso (Nikos)
    • Random survival forest + Lasso (Erik)
    • Random survival forest + Random forest (Erik)

@nignatiadis come up with consistent estimator_names.R file names etc

@nignatiadis I will do the S- and T-learners of lasso, and you can work on the F-, X-, and R-lasso

Things are looking great, I've gone over the *_grf estimators now with comments:

surv_fl_grf

  1. alpha = 0.05 should not be part of the signature, it should go into a list of optional arguments passed onto grf
  2. it should have an optional folds argument
  3. too much going on with creating the folds, you can just do fold.id <- sample(rep(seq(nfolds), length = nrow(X))
  4. cent <- testData$Y; cent[testData$D == 0] <- times I don't see the point of doing D==0 when all censored obs are dropped afterwards anyways, more readble would be km = survfit(Surv(Y[train], 1 - D[train]) ~ 1) summary(km, times = Y[test])$surv
  5. if t0 is less than the minimum failure t.min survival predicts P(T > t0) = P(T > t.min), that's not the same as grf/csf where it's defined to be 1
  6. cen.times.index <- findInterval(cent, c.fit$failure.times) is wrong if cent[i] is before the first failure in failure.times then cen.times.index=0 and in CSF these are defined to be 1
  7. sample.weights <- 1 / C.hat will be wrong if some C.hat=0 which can happen with survival, but not with grf's nelson-aalen
  8. All this is way too hard to read, it can be done without making any new data frames:
Z <-W *D / W.hat - (1 - W) * D / (1 - W.hat)
cc = (D == 1 | Y > t0)
regression_forest(X[cc,, drop = FALSE], Z[cc], sample.weights = sample.weights[cc])
  1. everywhere you do X[subset, ] you need to add drop = FALSE to keep it as a matrix for everything that expects a matrix

surv_rl_grf_lasso
10. compute.oob needs to be false for this workaround to work, see the comment in CSF here
11. times.index <- findInterval(times, y.fit$failure.times) wrong if times.index is 0, CSF defines the prob to be 1 then
12. all the tempdat, binary.data data.frames same as above: almost unreadable

surv_rl_grf:
13. I don't see the point of tau.only in predict.surv_rl_grf
14. The same comments as above

surv_sl_grf : same as 11)

surv_tl_gr: same as 1) + 9) + 11)

surv_xl_grf_lasso: same as all other comments

surv_xl_grf: same as all other

  1. final question for all: I thought the relevant censoring probabilities were P(Ci > min(Ti, t0) | X)? In all the code P(Ci > Ti | X) is used? That's not consistent with the final subsetting you do at the end...? complete.cases.or.updated.event.indicator = (D == 1 | Y > t0)

@erikcs Thank you so much for the detailed review and comments. These are all GREAT and VERY helpful, particularly item 15, you are right that the uncensored indicator = (D == 1 | Y > t0), a good catch!!

I will first fix 15 and the ones that may directly affect our results, then go through the others. Thanks again!

  1. sample.weights <- 1 / C.hat will be wrong if some C.hat=0 which can happen with survival, but not with grf's nelson-aalen

@nignatiadis What do you think we should do about this comment from Erik? In our setting, without adjusting any Xs, It only happens if all subjects are censored and censored before t0, which is rare but possible. I used to truncate the zero censoring weights to the smallest non-zero values, but I am not a big fan of weight truncation in general.

  1. sample.weights <- 1 / C.hat will be wrong if some C.hat=0 which can happen with survival, but not with grf's nelson-aalen

@nignatiadis What do you think we should do about this comment from Erik? In our setting, without adjusting any Xs, It only happens if all subjects are censored and censored before t0, which is rare but possible. I used to truncate the zero censoring weights to the smallest non-zero values, but I am not a big fan of weight truncation in general.

Checking the estimated censoring weights can be part of the input validation, if they are zero (or in the case of nelson-aalen extremely close to zero) you can raise an error, and suggest trying increasing t0?

  1. sample.weights <- 1 / C.hat will be wrong if some C.hat=0 which can happen with survival, but not with grf's nelson-aalen

@nignatiadis What do you think we should do about this comment from Erik? In our setting, without adjusting any Xs, It only happens if all subjects are censored and censored before t0, which is rare but possible. I used to truncate the zero censoring weights to the smallest non-zero values, but I am not a big fan of weight truncation in general.

Checking the estimated censoring weights can be part of the input validation, if they are zero (or in the case of nelson-aalen extremely close to zero) you can raise an error, and suggest trying increasing t0?

Thanks, Erik, yes, I think it is a good idea to use error. On the suggestion part, increasing t0 will give more time for developing an event but will also allow more censoring say due to loss-to-follow-up. Maybe we can say "check input variables or consider adjust t0"?

Minor extra points:

  1. The signature should ideally be consistent (X, Y, W, D, t0, W.hat, etc) for every estimator, currently it is not (e.g. surv_rl_grf_lasso). Since some does not accept W.hat, we may consider adding a ... at the end

  2. All estimators based on cox requires Y strictly greater than 0, that's different from the rest which accepts 0. cox fails with an error in this case