RevolutionAnalytics / RHadoop

RHadoop

Home Page:https://github.com/RevolutionAnalytics/RHadoop/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it possible to distributedly send a sparse Matrix?

hetong007 opened this issue · comments

I am testing sparse matrix operation on rhadoop, but it seems not possible.

The following is a piece of reproducible code:

require(rhdfs)
require(rmr2)
tmp = rmr.options(backend='local')

PageRank.mr = function(input, num.iter, dims) {
    V = rep(1/dims,dims)
    pr.map = function(., M) {
        keyval(1, M %*% V)
    }
    pr.reduce = function(k, Z) {
        vec = as.vector(Z)
        keyval(k, vec)
    }
    for(i in 1:num.iter) {
        result = mapreduce(input, map = pr.map, reduce = pr.reduce)
        V = values(from.dfs(result))
        V = V/sum(V)
    }
    return(V)
}

# Testing dense matrix
M = matrix(c(0,1/3,1/3,1/3,
             1/2,0,1/2,0,
             0,0,0,1,
             1/2,1/2,0,0),4,4)
Dist.M = to.dfs(M)
# The result
PageRank.mr(Dist.M,25,4)
# [1] 0.2647051 0.2352933 0.2058834 0.2941182

# Testing sparse Matrix
require(Matrix)
edgeList = cbind(c(1,1,1,2,2,3,4,4),
                 c(2,3,4,1,3,4,1,2))
spMat = spMatrix(nrow = 4, ncol = 4,
                 i = edgeList[,2], j = edgeList[,1], x = rep(1,nrow(edgeList)))
spMat = as(spMat,'dgCMatrix')
colS = colSums(spMat)
spMat = spMat %*% Diagonal(x = 1/colS)
Dist.spM = to.dfs(spMat)
# Not running
PageRank.mr(Dist.spM,25,4)
# Error in M %*% V : non-conformable arguments

This is a program calculating PageRank. It is working well with dense matrix, but the function to.dfs seems to fail in splitting the sparse matrix. I got the non-conformable arguments error because the matrix sent to each node is converted to a vector, rather than a matrix.

Yes, as you found out rmr2 doesn't support sparse matrices. I am not sure
why you thought it did. You could represent a sparse matrix as a data frame
with cols i,j,value and write a converter from this to the class you want
to use, then only use the data frame as you are using rmr2 API calls.

On Fri, Dec 26, 2014 at 11:59 PM, Tong He notifications@github.com wrote:

I am testing sparse matrix operation on rhadoop, but it seems not possible.

The following is a piece of reproducible code:

require(rhdfs)
require(rmr2)tmp = rmr.options(backend='local')
PageRank.mr = function(input, num.iter, dims) {
V = rep(1/dims,dims)
pr.map = function(., M) {
keyval(1, M %*% V)
}
pr.reduce = function(k, Z) {
vec = as.vector(Z)
keyval(k, vec)
}
for(i in 1:num.iter) {
result = mapreduce(input, map = pr.map, reduce = pr.reduce)
V = values(from.dfs(result))
V = V/sum(V)
}
return(V)
}

Testing dense matrixM = matrix(c(0,1/3,1/3,1/3,

         1/2,0,1/2,0,
         0,0,0,1,
         1/2,1/2,0,0),4,4)Dist.M = to.dfs(M)# The result

PageRank.mr(Dist.M,25,4)# [1] 0.2647051 0.2352933 0.2058834 0.2941182

Testing sparse Matrix

require(Matrix)edgeList = cbind(c(1,1,1,2,2,3,4,4),
c(2,3,4,1,3,4,1,2))spMat = spMatrix(nrow = 4, ncol = 4,
i = edgeList[,2], j = edgeList[,1], x = rep(1,nrow(edgeList)))spMat = as(spMat,'dgCMatrix')colS = colSums(spMat)spMat = spMat %% Diagonal(x = 1/colS)Dist.spM = to.dfs(spMat)# Not running
PageRank.mr(Dist.spM,25,4)# Error in M %
% V : non-conformable arguments

This is a program calculating PageRank. It is working well with dense
matrix, but the function to.dfs seems to fail in splitting the sparse
matrix. I got the non-conformable arguments error because the matrix sent
to each node is converted to a vector, rather than a matrix.


Reply to this email directly or view it on GitHub
#219.

Good point! Thanks.