Possible performance issue in irlba with sparse matrices
eromero-vlc opened this issue · comments
I think that irlba takes too much time with sparse matrices. Check out a comparison with RSpectra:
require(irlba)
require(RSpectra)
require(Matrix)
A <- as(sparseMatrix(i=1:5000,j=1:5000,x=1:5000), "dgCMatrix");
set.seed(1)
system.time(r<-irlba(A,40,tol=1e-5,verbose=TRUE))
#> Working dimension size 47
#> Initializing starting vector v with samples from standard normal distribution.
#> Use `set.seed` first for reproducibility.
#> irlba: using fast C implementation
#> user system elapsed
#> 7.696 0.000 7.680
r$mprod
#> [1] 3796
set.seed(1)
system.time(r<-irlba(A,40,tol=1e-5,work=80,verbose=TRUE))
#> Working dimension size 80
#> Initializing starting vector v with samples from standard normal distribution.
#> Use `set.seed` first for reproducibility.
#> irlba: using fast C implementation
#> user system elapsed
#> 1.904 0.000 1.905
r$mprod
#> [1] 1424
system.time(r<-RSpectra::svds(A,40,tol=1e-5))
#> user system elapsed
#> 0.192 0.000 0.193
r$nops
#> [1] 1141
When increasing the maximum basis size, the time reduces but it's still ten times slower than RSpectra. I think it can be an issue with the matvec implementation in C or the restarting.
Thanks. First please note RSPectra has some other issues, see https://bwlewis.github.io/irlba/comparison.html.
Oddly I get nearly the opposite result you do on my system:
require(irlba)
## Loading required package: irlba
## Loading required package: Matrix
require(RSpectra)
## Loading required package: RSpectra
require(Matrix)
set.seed(1)
A <- as(sparseMatrix(i=1:5000,j=1:5000,x=1:5000), "dgCMatrix")
set.seed(1)
system.time(r<-irlba(A,40,tol=1e-5))
## user system elapsed
## 8.304 2.364 2.677
set.seed(1)
system.time(r<-irlba(A,40,tol=1e-5,work=80))
## user system elapsed
## 2.428 0.504 0.735
system.time(r<-RSpectra::svds(A,40,tol=1e-5))
## user system elapsed
## 15.068 0.000 15.090
This was tested with:
R.version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status beta
major 3
minor 4.0
year 2017
month 04
day 08
svn rev 72499
language R
version.string R version 3.4.0 beta (2017-04-08 r72499)
packageDescription("irlba")
Package: irlba
Type: Package
Title: Fast Truncated SVD, PCA and Symmetric Eigendecomposition for
Large Dense and Sparse Matrices
Version: 2.2.0
Package: RSpectra
Type: Package
Title: Solvers for Large Scale Eigenvalue and SVD Problems
Version: 0.12-0
on my quad-core home AMD Athlon A10-7850K PC with 16 GB RAM.
I can think of two things that might account for this:
- I used the version of irlba from GitHub, maybe a bit faster than the CRAN version.
- irlba uses the same BLAS/LAPACK libs that R does. On my system I use libopenblas (http://www.openblas.net/). If you install R using a reference default BLAS this can be very slow (for lots of other R functions too).
Can you check your BLAS/LAPACK library that R is using?
Any other ideas?
FYI here is an old note I wrote on how I like to configure BLAS for R:
http://illposed.net/r-on-linux.html
p.s. sorry about the long latency.
I was using the R from the SUSE distribution. It seems that R is configured with an unoptimized BLAS. After recompiling R with OpenBLAS the performance is quite similar to what you report. Thanks!