prcomp_irlba

Question

prcomp_irlba

erichson opened this issue 7 years ago · comments

You give the following example for the prcomp_irlba, however, the results are not consistent:

> set.seed(1)
> x  <- matrix(rnorm(200), nrow=20)
> p1 <- prcomp_irlba(x, n=3)
> summary(p1)
Importance of components%s:
                         PC1    PC2    PC3
Standard deviation     1.541 1.2513 1.1916
Proportion of Variance 0.443 0.2921 0.2649
Cumulative Proportion  0.443 0.7351 1.0000
> p2 <- prcomp(x, tol=0.7)
> summary(p2)
Importance of first k=3 (out of 10) components:
                          PC1    PC2    PC3
Standard deviation     1.5411 1.2513 1.1916
Proportion of Variance 0.2806 0.1850 0.1678
Cumulative Proportion  0.2806 0.4656 0.6334

B. W. Lewis · Answer 1 · Mon Oct 02 2017 08:47:34 GMT+0800 (China Standard Time)

Thanks for noting this! I will add a comment in the documentation for prcomp_irlba explitly about this.

It's not an error -- the differences are due to the fact that prcomp_irlba computes something different than prcomp -- it computes a truncated principal components decomposition. Thus the cumulative summaries are not known and cannot be computed.

The current documentation has some notes on the differences, but doesn't mention your example so I will add that. Thanks for pointing it out, the added documentation will help others.

Best,

Bryan

B. W. Lewis · Answer 2 · Mon Oct 02 2017 09:00:16 GMT+0800 (China Standard Time)

OK, see ?prcomp_irlba and example(prcomp_irlba). Let me know if you think that the exampe can be better. Feel free to send a PR!

Ben Erichson · Answer 3 · Tue Oct 03 2017 02:32:33 GMT+0800 (China Standard Time)

Bryan, that is certainly a fix, but I would suggest the approach I take for computing the randomized principal component analysis: https://github.com/Benli11/rSVD/blob/master/R/rpca.R, line 203.

a) you compute the total variation of the input data as sigma = sum( apply( Re(A) , 2, stats::var ), where A denotes the input matrix.

b) Then you can compute the explained variance ratio as sdev**2 / sigma, where sdev denotes the square root of the approximated dominant k eigenvalues.

Best,
Ben

B. W. Lewis · Answer 4 · Thu Oct 05 2017 21:36:08 GMT+0800 (China Standard Time)

Hi Ben,

Thanks! That didn't occur to me.

I didn't realize someone else had implemented the Martinsson, etc. method in R. I guess we have two implementations for now. I added references to your package in the help and vingette.

I'm wondering if I should just remove that algorithm from the irlba package in the medium term, and simply point to your package in the future? Would you consider adding a convergence implementation similar to the one in rsvd? Perhaps I can send a pull request sometime... What would you prefer?

I have found that block methods are superior for pathological problems with clustered singular values, see the examples in irlba. That's why I ended up putting the Martinsson method in there in the first place. Ultimately I'd like to have a solid block irlba implementation in there, but I've had trouble getting something that works well in general (the nice original paper by Baglama and Reichel, http://www.math.uri.edu/~jbaglama/papers/paper17.pdf, has some implementation issues that I have never resolved that result in poor performance sometimes.)

Again, thanks for helping to improve irlba with your suggestion. Further collaboration would be great! *

Best,

Bryan

@benli11 perhaps you might be interested in co-maintaining irlba? just a thought...

Ben Erichson · Answer 5 · Fri Oct 06 2017 08:16:28 GMT+0800 (China Standard Time)

Bryan, thanks for adding the references. That is really great! Whether you depreciate the rsvd implementation is up to you, of course. Anyway, I would be happy to add variants of the rsvd if you send a pull request. I think that can make the rsvd package stronger.

I would be very happy in helping you co-maintaining the irlba package, if I can make helpful contributions. I send you you an email regarding ideas for a future collaboration. Also, I am looking forward to discuss some of the pathological problems, and see if we can come up with some better implementations.

Best,
Ben