prediction fails with poly() in formula

Question

prediction fails with poly() in formula

leeper opened this issue 6 years ago · comments

> x <- glm(Sepal.Length ~ stats::poly(Sepal.Width, 3), data = iris)
> prediction(x)
Average prediction for 150 observations: 5.8433
> prediction(x, at = list(Sepal.Width = 2:4))
Error in stats::poly(Sepal.Width, 3) : 
  'degree' must be less than number of unique points
> traceback()
11: stop("'degree' must be less than number of unique points")
10: stats::poly(Sepal.Width, 3)
9: eval(predvars, data, env)
8: eval(predvars, data, env)
7: model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels)
6: model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels)
5: predict.lm(object, newdata, se.fit, scale = residual.scale, type = ifelse(type == 
       "link", "response", type), terms = terms, na.action = na.action)
4: predict.glm(model, newdata = out, type = type, se.fit = TRUE, 
       ...)
3: predict(model, newdata = out, type = type, se.fit = TRUE, ...)
2: prediction.glm(x, at = list(Sepal.Width = 2:4))
1: prediction(x, at = list(Sepal.Width = 2:4))

Thomas J. Leeper · Answer 1 · Thu Feb 15 2018 17:47:30 GMT+0800 (China Standard Time)

The underlying issue here is that predict() gives different values when the model uses stats::poly() versus just poly():

require("stats")
data(cars)
# model specified with `poly()`
m <- lm(dist ~ stats::poly(speed, 3), data = cars)
# model specified using modified terms
m2 <- lm(dist ~ speed + I(speed^2) + I(speed^3), data = cars)

# predictions are equivalent
all.equal(predict(m), predict(m2))

# but predictions with new data are not
all.equal(
 predict(m, newdata = data.frame(speed = 4:9)),
 predict(m2, newdata = data.frame(speed = 4:9))
)
# [1] "Mean relative difference: 0.8405171"

predict(m, newdata = data.frame(speed = 4:9))
#           1          2          3          4          5          6 
#  -36.600687  -4.520996  19.660440  46.227212  85.462909 147.651122
predict(m, newdata = data.frame(speed = 5:10))
#           1          2          3          4          5          6 
#  -36.600687  -4.520996  19.660440  46.227212  85.462909 147.651122

predict(m2, newdata = data.frame(speed = 4:9))
#         1         2         3         4         5         6 
#  2.760981  7.040541 10.928348 14.485912 17.774747 20.856365
predict(m2, newdata = data.frame(speed = 5:10))
#         1         2         3         4         5         6 
#  7.040541 10.928348 14.485912 17.774747 20.856365 23.792277

predict(lm(dist ~ poly(speed, 3), data = cars), newdata = data.frame(speed = 4:9))
#         1         2         3         4         5         6 
#  2.760981  7.040541 10.928348 14.485912 17.774747 20.856365 
predict(lm(dist ~ poly(speed, 3), data = cars), newdata = data.frame(speed = 5:10))
#         1         2         3         4         5         6 
#  7.040541 10.928348 14.485912 17.774747 20.856365 23.792277

If I can get prediction() to not encounter the "poly not found" error, then the above shouldn't be an issue.

Thomas J. Leeper · Answer 2 · Thu Feb 15 2018 18:31:47 GMT+0800 (China Standard Time)

Here's the specific error:

> prediction::prediction(glm(Sepal.Length ~ stats::poly(Sepal.Width, 3), data = iris))
Average prediction for 150 observations: 5.8433
> prediction::prediction(glm(Sepal.Length ~ poly(Sepal.Width, 3), data = iris))
Error in poly(Sepal.Width, 3, coefs = list(alpha = c(3.05733333333333,  : 
  could not find function "poly"
> traceback()
9: eval(predvars, data, env)
8: eval(predvars, data, env)
7: model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels)
6: model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels)
5: predict.lm(object, newdata, se.fit, scale = residual.scale, type = ifelse(type == 
       "link", "response", type), terms = terms, na.action = na.action)
4: predict.glm(model, newdata = out, type = type, se.fit = TRUE, 
       ...)
3: predict(model, newdata = out, type = type, se.fit = TRUE, ...)
2: prediction.glm(glm(Sepal.Length ~ poly(Sepal.Width, 3), data = iris))
1: prediction::prediction(glm(Sepal.Length ~ poly(Sepal.Width, 3), 
       data = iris))

Thomas J. Leeper · Answer 3 · Sat Apr 28 2018 22:46:14 GMT+0800 (China Standard Time)

Probably the bug discussed here: https://stat.ethz.ch/pipermail/r-devel/2018-April/075889.html

Thomas J. Leeper · Answer 4 · Thu May 10 2018 21:40:54 GMT+0800 (China Standard Time)

Underlying issues in R's safe prediction patched here: wch/r-source@e372854

Thomas J. Leeper · Answer 5 · Thu May 10 2018 21:51:19 GMT+0800 (China Standard Time)

There were two issues here. One was that poly() was misbehaving in R itself. That's fixed. Using poly() rather than stats::poly() still fails. That's at least just an annoyance rather than a bug.