bwlewis / irlba

Fast truncated singular value decompositions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sparse matrices and irlba

simon-anders opened this issue · comments

The help page for 'irlba' states that A can be a 'numeric real- or complex-valued matrix or real-valued sparse matrix'. I tried it with a 'dgTMatrix" and it blew up my memory. Looking into the source code, I found a comment that the fast C implementation only works for dgCMatrices.

Two Suggestions:

  1. Please state in the help page explicitely that only coordinate-form sparse matrices (class dgCMatrix) are supported and that other sparse types (such as triple-sparse dgTMatrices) will get expanded to dense form.

  2. Maybe issue a warning when a sparse matrix gets expanded using 'as.matrix'.

PS: Thanks a lot for irlba -- especially the sparse implementation is really useful for my project.

Yes indeed, only dgCMatrices matrices are supported internally. Sorry about that!

I will implement these suggestions; the warning might not be enough since warnings are deferred by default in R. Maybe an error that directs the user to put in as.matrix instead might be safer?

Longer term, it would be nice to add support for the rest of the available SuiteSparse matrix types...

Sorry, just getting 'round to carefully fixing this. I see already in the documentation this:

#' The \code{fastpath=TRUE} option only supports real-valued matrices and sparse matrices
#' of type \code{dgCMatrix} (for now). Other problems fall back to the reference
#' R implementation.

Indeed, the code does now automatically fall back to the slow (non-C code) path if a sparse matrix that is not dgCMatrix is supplied, see line 336 of irlba.R. Here is an example:

A <- spMatrix(1000,1000, i=sample(1000,5000,replace=TRUE), j=sample(1000,5000,replace=TRUE), x=rnorm(5000))
svd(A)$d[1:5]   # slow
# [1] 6.499243 5.740359 5.650176 5.606647 5.382097
irlba(A,5)$d
# [1] 6.499243 5.740359 5.650176 5.606647 5.382097
class(A)
# [1] "dgTMatrix"

I'm thinking now that perhaps you hit some other bug? Can you provide a reproducible example?

Thanks,

Bryan

Does the fall-back R code include converting the sparse matrix to a dense one, or can it work directly on sparse matrices? By "blew up my memory", I meant my memory ran out, presumably because the code tried to convert my 500 MB sparse matrix into a 50 GB dense matrix, which of course caused R to hang..

No, except for a special case of tiny matrices (less than 6x6). So the memory pressure was due to something else, maybe a more serious problem.

The prcomp_irlba does compute extra centering and scaling vectors of length equal to the number of columns prior to invoking irlba. But probably not a problem...

The prcomp_irlba does compute extra centering and scaling vectors of length equal to the number of columns prior to invoking irlba. But probably not a problem...

Are you sure?

Centering does cause a matrix to lose sparseness. After all, if you subtract, say, a row mean µ, all the zeroes turn into -µ, and so the matrix is no longer sparse, even though it may still be stored in a sparse form -- and then it takes even more memory than the dense form.

That's why I asked in #47 whether it's possible at all to treat sparse matrices appropriately in PCA.

Though I'm realizing now that I'm writing in #46 here. Are we talking about irlba or prcomp_irlba at the moment?

I'm referring to prcomp_irlba in this thread.

The irlba algorithm implicitly centers matrices without explicitly forming the matrix (described in the help for irlba). The centered matrix is never formed...all prcomp_irlba does is compute the column means and optional scaling to set up a call for irlba.

Ooops, sorry I am in the wrong thread; these comments belong in #47.

Sorry about that, I mixed up the issue threads. Maybe we can close this one?