susanathey / causalTree

Working repository for Causal Tree and extensions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting leaf assignments for new data

bquistorff opened this issue · comments

To do split-sample estimation, we fit the tree structure on, say, data1 and then estimate treatment effects by leaf on data2. Is there an easy way to get the leaf assignments for data2? The leaf assignments for the original tree are in tree$where, but those don't seem to be updated by estimate.causalTree (and the $where field from honest.causalTree appear to also be for the tree-fit data). rpart doesn't expose this easily either, but a work around was noted here.

where2 = rpart:::pred.rpart(tree1, rpart:::rpart.matrix(data2))

Is there some better way to do this?

The easiest thing to do is to use the predict command on the tree you get from estimate.causalTree, and then create a factor variable, like this:
dataTest$leaff <- as.factor(round(predict(tree_honest_prune,newdata=dataTest,type="vector"),4))

This is not a perfect workaround--if you have two leaves that have exactly the same estimates, but that is unlikely unless the outcome is binary.

We can work on updating the leaf assignments in tree$where.