Tracking of readable implementations of estimators

Question

Tracking of readable implementations of estimators

nignatiadis opened this issue 2 years ago · comments

Nikos Ignatiadis commented 2 years ago

Many of these are in the file comparison_estimators.R, which should be part of the package.

Yizhe Xu · Answer 1 · Wed Apr 06 2022 04:24:27 GMT+0800 (China Standard Time)

@nignatiadis come up with consistent estimator_names.R file names etc

Yizhe Xu · Answer 2 · Wed Apr 06 2022 05:10:32 GMT+0800 (China Standard Time)

@nignatiadis I will do the S- and T-learners of lasso, and you can work on the F-, X-, and R-lasso

Erik Sverdrup · Answer 3 · Thu Apr 14 2022 02:23:02 GMT+0800 (China Standard Time)

Things are looking great, I've gone over the *_grf estimators now with comments:

surv_fl_grf

alpha = 0.05 should not be part of the signature, it should go into a list of optional arguments passed onto grf
it should have an optional folds argument
too much going on with creating the folds, you can just do fold.id <- sample(rep(seq(nfolds), length = nrow(X))
cent <- testData$Y; cent[testData$D == 0] <- times I don't see the point of doing D==0 when all censored obs are dropped afterwards anyways, more readble would be km = survfit(Surv(Y[train], 1 - D[train]) ~ 1) summary(km, times = Y[test])$surv
if t0 is less than the minimum failure t.min survival predicts P(T > t0) = P(T > t.min), that's not the same as grf/csf where it's defined to be 1
cen.times.index <- findInterval(cent, c.fit$failure.times) is wrong if cent[i] is before the first failure in failure.times then cen.times.index=0 and in CSF these are defined to be 1
sample.weights <- 1 / C.hat will be wrong if some C.hat=0 which can happen with survival, but not with grf's nelson-aalen
All this is way too hard to read, it can be done without making any new data frames:

Z <-W *D / W.hat - (1 - W) * D / (1 - W.hat)
cc = (D == 1 | Y > t0)
regression_forest(X[cc,, drop = FALSE], Z[cc], sample.weights = sample.weights[cc])

everywhere you do X[subset, ] you need to add drop = FALSE to keep it as a matrix for everything that expects a matrix

surv_rl_grf_lasso
10. compute.oob needs to be false for this workaround to work, see the comment in CSF here
11. times.index <- findInterval(times, y.fit$failure.times) wrong if times.index is 0, CSF defines the prob to be 1 then
12. all the tempdat, binary.data data.frames same as above: almost unreadable

surv_rl_grf:
13. I don't see the point of tau.only in predict.surv_rl_grf
14. The same comments as above

surv_sl_grf : same as 11)

surv_tl_gr: same as 1) + 9) + 11)

surv_xl_grf_lasso: same as all other comments

surv_xl_grf: same as all other

final question for all: I thought the relevant censoring probabilities were P(Ci > min(Ti, t0) | X)? In all the code P(Ci > Ti | X) is used? That's not consistent with the final subsetting you do at the end...? complete.cases.or.updated.event.indicator = (D == 1 | Y > t0)

Yizhe Xu · Answer 4 · Thu Apr 14 2022 04:55:27 GMT+0800 (China Standard Time)

@erikcs Thank you so much for the detailed review and comments. These are all GREAT and VERY helpful, particularly item 15, you are right that the uncensored indicator = (D == 1 | Y > t0), a good catch!!

I will first fix 15 and the ones that may directly affect our results, then go through the others. Thanks again!

Yizhe Xu · Answer 5 · Thu Apr 14 2022 14:50:34 GMT+0800 (China Standard Time)

sample.weights <- 1 / C.hat will be wrong if some C.hat=0 which can happen with survival, but not with grf's nelson-aalen

@nignatiadis What do you think we should do about this comment from Erik? In our setting, without adjusting any Xs, It only happens if all subjects are censored and censored before t0, which is rare but possible. I used to truncate the zero censoring weights to the smallest non-zero values, but I am not a big fan of weight truncation in general.

Erik Sverdrup · Answer 6 · Fri Apr 15 2022 23:16:14 GMT+0800 (China Standard Time)

sample.weights <- 1 / C.hat will be wrong if some C.hat=0 which can happen with survival, but not with grf's nelson-aalen

@nignatiadis What do you think we should do about this comment from Erik? In our setting, without adjusting any Xs, It only happens if all subjects are censored and censored before t0, which is rare but possible. I used to truncate the zero censoring weights to the smallest non-zero values, but I am not a big fan of weight truncation in general.

Checking the estimated censoring weights can be part of the input validation, if they are zero (or in the case of nelson-aalen extremely close to zero) you can raise an error, and suggest trying increasing t0?

Yizhe Xu · Answer 7 · Sat Apr 16 2022 00:06:08 GMT+0800 (China Standard Time)

sample.weights <- 1 / C.hat will be wrong if some C.hat=0 which can happen with survival, but not with grf's nelson-aalen

@nignatiadis What do you think we should do about this comment from Erik? In our setting, without adjusting any Xs, It only happens if all subjects are censored and censored before t0, which is rare but possible. I used to truncate the zero censoring weights to the smallest non-zero values, but I am not a big fan of weight truncation in general.

Checking the estimated censoring weights can be part of the input validation, if they are zero (or in the case of nelson-aalen extremely close to zero) you can raise an error, and suggest trying increasing t0?

Thanks, Erik, yes, I think it is a good idea to use error. On the suggestion part, increasing t0 will give more time for developing an event but will also allow more censoring say due to loss-to-follow-up. Maybe we can say "check input variables or consider adjust t0"?

Erik Sverdrup · Answer 8 · Wed Apr 20 2022 11:41:19 GMT+0800 (China Standard Time)

Minor extra points:

The signature should ideally be consistent (X, Y, W, D, t0, W.hat, etc) for every estimator, currently it is not (e.g. surv_rl_grf_lasso). Since some does not accept W.hat, we may consider adding a ... at the end
All estimators based on cox requires Y strictly greater than 0, that's different from the rest which accepts 0. cox fails with an error in this case