jaredhuling / oem

Penalized least squares estimation using the Orthogonalizing EM (OEM) algorithm

Home Page:http://jaredhuling.org/oem

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

n slighly > p

mrrizkallah opened this issue · comments

Dear OEM,

I have been using cv.oem. I was trying to keep n > p. The problem is that once n is slightly closer to p, R session either crashes or does not finish, both on Windows and Linux (both in RStudio or the console). The issue gets worse with grouping and cross validation. Please find my example.

Thank you.

nobs  <- 150
nvars <- 60

X <- matrix(rnorm(nobs * nvars), ncol = nvars)
group.indicators <- rep(1:(60/10), each = 6)
y <- rbinom(nobs, 1, 
  prob = 1 / (1 + exp(-drop(X %*% c(0.15, 0.15, -0.15, -0.15, 0.25, rep(0, nvars - 5)))))
)

input <- X

train_rows <- sample(1:nobs, 0.66 * nobs)
x.train <- as.matrix(input[train_rows, ])
x.test <- as.matrix(input[- (train_rows), ])

y.train <- y[train_rows]
y.test <- y[-train_rows]

cvfit <- oem::cv.oem(
  x = x.train, y = y.train,
  penalty = "grp.lasso",
  groups = group.indicators,
  type.measure = "auc",
  nlambda = 100,
  grouped = TRUE,
  family = "binomial"
)
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3    yaml_2.1.19 

The OEM algorithm has poor convergence properties for this scenario and I would recommend you use another algorithm instead. OEM is explicitly designed for the n >> p case.