Tobit expected values incorrect?
izahn opened this issue · comments
Consider these two models (from the tobit vignette)
m1 <- survreg(Surv(durable, durable>0, type='left') ~ age + quant,
data=tobin, dist='gaussian')
z.out <- zelig(durable ~ age + quant, model = "tobit", data = tobin)
coefs and vcov are the same:
all.equal(vcov(z.out)[[1]], vcov(m1))
## TRUE
all.equal(coef(z.out), coef(m1))
## TRUE
but the expected values as very different
predict(m1,
newdata = data.frame(lapply(model.frame(m1)[-1], mean)),
se.fit = TRUE)
## $fit
## 1
## -2.067679
##
## $se.fit
## 1
## 1.962933
ev <- sim(setx(z.out))$get_qi(qi = "ev")
c(mean = mean(ev), sd = sd(ev))
## mean sd
## 1.5536944 0.6302974
Do we expect these to be so different? If so, why?
Before digging into this further, is this issue adequately addressed by #315?
Thanks for the info. I'll take a look.
For reference:
Lines 148 to 163 in 0f4e1f1
This is a hasty test to see if the Y
, rather than Y*
is the issue. The example and Y
calculation procedure is taken directly from: https://stats.stackexchange.com/a/149529
# Create data
N = 10
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
set.seed(100)
x = rnorm(8*N)+1
beta = 5
epsilon = rnorm(8*N,sd = sqrt(1/5))
y.star = x*beta+fcoeff+epsilon ## latent response
y = y.star
y[y<0] <- 0 ## censored response
test_data <- data.frame(y = y, zero = 0, x = x, f = f)
# Estimate with AER -------------------------------------------------------
library(AER)
fit <- tobit(y~ 0 + x + f, data = test_data)
# E[Y*] ---------------
mean(predict(fit))
## [1] 0.9823243
# E[Y] -----------
mu <- fitted(fit)
sigma <- fit$scale
p0 <- pnorm(mu/sigma)
lambda <- function(x) dnorm(x)/pnorm(x)
ey0 <- mu + sigma * lambda(mu/sigma)
ey <- p0 * ey0
c(mean = mean(ey, na.rm = T), sd = sd(ey, na.rm = T))
## mean sd
## 2.544692 3.354458
# Estimate with Zelig -------------------------------------------------------
zfit <- zelig(y~ 0 + x + f, data = test_data, model = "tobit", cite = FALSE)
# E[Y*] ---------------
mean(unlist(predict(zfit)))
## [1] 0.9823243
ev <- sim(setx(zfit))$get_qi(qi = "ev")
c(mean = mean(ev), sd = sd(ev))
# E[Y] ???? -----------
ev <- sim(setx(zfit))$get_qi(qi = "ev")
c(mean = mean(ev), sd = sd(ev))
## mean sd
## 0.6255104 0.1359505
Note that in this example:
mean(y)
## [1] 2.51392
sd(y)
## [1] 3.353757
which is almost identical to the E[Y]
using Achim's procedure from Stack Exchange, but very different from Zelig. The ev
from Zelig is also fairly different from the E[Y*]
from predict
.
@cchoirat am I missing something about what Zelig is trying to achieve (or misinterpreted the results from this test)?