flashxio / FlashX

FlashX is a collection of big data analytics tools that perform data analytics in the form of graphs and matrices.

Home Page:http://flashx.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FlashR]: compute eigenvalues in blocks.

zheng-da opened this issue · comments

The benefits of doing so is to minimize the amount of data written to SSDs.
However, the drawback is that we need more iterations to perform the computation.

For example,

library(FlashGraphR)
fg.set.conf("matrix/conf/run_test-IM.txt")
fg <- fg.load.graph("/mnt/nfs/graph-data/friendster.adj", "/mnt/nfs/graph-data/friendster.index")
spm <- fg.get.sparse.matrix(fg)
mul <- function(x, extra) spm %*% x
start <- Sys.time(); res <- fm.eigen(mul, nrow(spm), sym=TRUE, k=20, which="LM"); end <- Sys.time()
print(end - start)
start <- Sys.time(); res2 <- fm.eigen(mul, nrow(spm), sym=TRUE, k=10, which="LM", prev.eval=res$values[1:10], prev.evec=res$vectors[,1:10]); end <- Sys.time()
print(end - start)

Computing 20 eigenvalues takes 180 iterations and 10.89 minutes.
Computing 11th-20th eigenvalues takes 170 iterations and 11.93 minutes.
In order words, there isn't any advantage of just a few eigenvalues and then the next few eigenvalues.

commit 38f2b1ac3018b9b6f0e799f3945218e7385e160d
Author: Da Zheng <zhengda1936@gmail.com>
Date:   Wed Apr 5 10:32:32 2017 -0400

    [R]: add fm.eigen.block.
    
    This computes eigenvalues in blocks to minimize the amount of data
    written to disks.