stekhoven / missForest

missForest is a nonparametric, mixed-type imputation method for basically any type of data for the statistical software R.

Home Page:http://stat.ethz.ch/CRAN/web/packages/missForest/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error if data contains character variables

pablo14 opened this issue · comments

Hi! Nice package! I found a bug when data frame contains character column (instead of factor)

Example:

library(Lock5Data)
library(dplyr)
data("HollywoodMovies2011")
# Movie is the ID column.
imputationResults <- missForest(xmis = select(HollywoodMovies2011, -Movie))

It throws the error:
argument is not numeric or logical: returning NA missForest iteration 1 in progress...
NAs introduced by coercionError in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, :
NA/NaN/Inf in foreign function call (arg 1)

By intution I checked the data types, and convert the only character variable into factor, and now it doesn't crash. This works ok:

HollywoodMovies2011_copy=HollywoodMovies2011
HollywoodMovies2011_copy$TheatersOpenWeek_2=as.factor(HollywoodMovies2011_copy$TheatersOpenWeek_2)
imputationResults <- missForest(xmis = select(HollywoodMovies2011_copy, -Movie))

cheers :)