sparse matrices and irlba

Question

sparse matrices and irlba

simon-anders opened this issue 6 years ago · comments

The help page for 'irlba' states that A can be a 'numeric real- or complex-valued matrix or real-valued sparse matrix'. I tried it with a 'dgTMatrix" and it blew up my memory. Looking into the source code, I found a comment that the fast C implementation only works for dgCMatrices.

Two Suggestions:

Please state in the help page explicitely that only coordinate-form sparse matrices (class dgCMatrix) are supported and that other sparse types (such as triple-sparse dgTMatrices) will get expanded to dense form.
Maybe issue a warning when a sparse matrix gets expanded using 'as.matrix'.

PS: Thanks a lot for irlba -- especially the sparse implementation is really useful for my project.

B. W. Lewis · Answer 1 · Thu Jan 24 2019 07:29:05 GMT+0800 (China Standard Time)

Yes indeed, only dgCMatrices matrices are supported internally. Sorry about that!

I will implement these suggestions; the warning might not be enough since warnings are deferred by default in R. Maybe an error that directs the user to put in as.matrix instead might be safer?

Longer term, it would be nice to add support for the rest of the available SuiteSparse matrix types...

B. W. Lewis · Answer 2 · Mon Jan 28 2019 22:48:04 GMT+0800 (China Standard Time)

Sorry, just getting 'round to carefully fixing this. I see already in the documentation this:

#' The \code{fastpath=TRUE} option only supports real-valued matrices and sparse matrices
#' of type \code{dgCMatrix} (for now). Other problems fall back to the reference
#' R implementation.

Indeed, the code does now automatically fall back to the slow (non-C code) path if a sparse matrix that is not dgCMatrix is supplied, see line 336 of irlba.R. Here is an example:

A <- spMatrix(1000,1000, i=sample(1000,5000,replace=TRUE), j=sample(1000,5000,replace=TRUE), x=rnorm(5000))
svd(A)$d[1:5]   # slow
# [1] 6.499243 5.740359 5.650176 5.606647 5.382097
irlba(A,5)$d
# [1] 6.499243 5.740359 5.650176 5.606647 5.382097
class(A)
# [1] "dgTMatrix"

I'm thinking now that perhaps you hit some other bug? Can you provide a reproducible example?

Thanks,

Bryan

Simon Anders · Answer 3 · Mon Jan 28 2019 23:00:23 GMT+0800 (China Standard Time)

Does the fall-back R code include converting the sparse matrix to a dense one, or can it work directly on sparse matrices? By "blew up my memory", I meant my memory ran out, presumably because the code tried to convert my 500 MB sparse matrix into a 50 GB dense matrix, which of course caused R to hang..

B. W. Lewis · Answer 4 · Mon Jan 28 2019 23:03:50 GMT+0800 (China Standard Time)

No, except for a special case of tiny matrices (less than 6x6). So the memory pressure was due to something else, maybe a more serious problem.

The prcomp_irlba does compute extra centering and scaling vectors of length equal to the number of columns prior to invoking irlba. But probably not a problem...

Simon Anders · Answer 5 · Mon Jan 28 2019 23:39:23 GMT+0800 (China Standard Time)

The prcomp_irlba does compute extra centering and scaling vectors of length equal to the number of columns prior to invoking irlba. But probably not a problem...

Are you sure?

Centering does cause a matrix to lose sparseness. After all, if you subtract, say, a row mean µ, all the zeroes turn into -µ, and so the matrix is no longer sparse, even though it may still be stored in a sparse form -- and then it takes even more memory than the dense form.

That's why I asked in #47 whether it's possible at all to treat sparse matrices appropriately in PCA.

Though I'm realizing now that I'm writing in #46 here. Are we talking about irlba or prcomp_irlba at the moment?

B. W. Lewis · Answer 6 · Tue Jan 29 2019 00:19:22 GMT+0800 (China Standard Time)

I'm referring to prcomp_irlba in this thread.

The irlba algorithm implicitly centers matrices without explicitly forming the matrix (described in the help for irlba). The centered matrix is never formed...all prcomp_irlba does is compute the column means and optional scaling to set up a call for irlba.

B. W. Lewis · Answer 7 · Tue Jan 29 2019 00:20:40 GMT+0800 (China Standard Time)

Ooops, sorry I am in the wrong thread; these comments belong in #47.

B. W. Lewis · Answer 8 · Tue Jan 29 2019 00:21:46 GMT+0800 (China Standard Time)

Sorry about that, I mixed up the issue threads. Maybe we can close this one?