mikera / vectorz

Fast and flexible numerical library for Java featuring N-dimensional arrays

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

clarification needed: is VectorMatrixMN expected to be sparse?

robyn-kozierok opened this issue · comments

Would it be inefficient to use VectorMatrixMN for a dense matrix?

Or maybe the better question is, if I have a Matrix I need to build, and I know the number of rows in advance, and the row data could either be generated in a Vector or an array, what would be the most efficient Matrix class to use?

It's actually going to be the second multiplicand, so probably a matrix stored as columns would be best, but I don't see a non-sparse option for that.

I have an n x n SparseRowMatrix that will be multiplied by an n x k dense matrix, where n is large (tens or hundreds of thousands) and k is small (under 10).

Thanks in advance for your help and advice!

Further to my question above, I actually have an iterative process where I have to replace a bunch of rows in the n x k matrix with a saved set of rows (which are currently saved in another Matrix, but I could store them however makes sense).

So I have t which is my n x n SparseRowMatrix and y which is my n x k other matrix, and I need to do

y = t.innerProduct(y);

Then replace a bunch (m = about n/4) of the rows in y with rows from another dense matrix (or whatever storage for the saved rows makes sense). (If it matters, the rows being replaced are the top m rows of y.)

Then iterate, and do the multiplication and replacement again (on the order of 10 iterations)

So, I think upon reflection, that representing y in a specialty way as either row-based or column-based storage would be problematic, and it should probably just be a basic AMatrix -- in which case I won't have replaceRow available, so is it more efficient to do a setRow or to set the elements one-by-one? Obviously if there is a method that would let me efficiently replace an entire chunk of the matrix with the contents of another matrix, that would be great.

Thanks!

I just want to add a BIG thanks, as my runtime for my n=32K test case is down under 2 minutes (using setRow to do the row replacement mentioned above).

The best I'd gotten before with my hacked multipication to avoid having to do a bunch of setRows for normalizing my zero rows was about 25 minutes, and the best while including the setRows was many hours, so this is a huge improvement already!

And my apologies for morphing this issue/question so much from what I asked originally, although it would still be helpful to have that point clarified as well, for future reference. :)

A straight Matrix is usually the most efficient for a dense matrix. If you know in advance that you data will be dense, it is best to use that.

Occasionally you want column based storage, but I think that is only really true for sparse matrices. You can try benchmarking a Matrix vs a SparseColumnMatrix (constructed entirely of dense Vector columns) as a second multiplication operand if you like, but I'm pretty sure the regular Matrix will be faster.

Thanks. Given that I also have to replace rows, on reflection it seems that neither a row-Vector-based or column-Vector-based matrix will be suitable.

When replacing rows in a regular dense matrix, do you have a feel for whether setRow is best, or setting the elements one-by-one?

Thanks again for all your help.

-----Original Message-----
From: Mike Anderson [mailto:notifications@github.com]
Sent: Tuesday, January 21, 2014 2:08 AM
To: mikera/vectorz
Cc: Kozierok, Robyn
Subject: Re: [vectorz] clarification needed: is VectorMatrixMN expected to be
sparse? (#14)

A straight Matrix is usually the most efficient for a dense matrix. If you know
in advance that you data will be dense, it is best to use that.

Occasionally you want column based storage, but I think that is only really true
for sparse matrices. You can try benchmarking a Matrix vs a
SparseColumnMatrix (constructed entirely of dense Vector columns) as a
second multiplication operand if you like, but I'm pretty sure the regular
Matrix will be faster.


Reply to this email directly or view it on GitHub
#14 (comment) .
<https://github.com/notifications/beacon/6372841__eyJzY29wZSI6Ik5ld3NpZ
XM6QmVhY29uIiwiZXhwaXJlcyI6MTcwNTgyMDg4MiwiZGF0YSI6eyJpZCI6MjM
4NzUxMzJ9fQ==--a1cbbbfbf43bafdc0fe475b0d24f4ce371c73210.gif>

setRow should be very fast on a regular dense matrix - it's a heavily optimised operation. Definitely don't worry about that one. It will be better than setting elements one by one.

In fact setRow is usually going to be better than one-by-one setting in any circumstance.

Awesome, thanks.

-----Original Message-----
From: Mike Anderson [mailto:notifications@github.com]
Sent: Tuesday, January 21, 2014 11:11 AM
To: mikera/vectorz
Cc: Kozierok, Robyn
Subject: Re: [vectorz] clarification needed: is VectorMatrixMN expected to be
sparse? (#14)

setRow should be very fast on a regular dense matrix - it's a heavily optimised
operation. Definitely don't worry about that one.


Reply to this email directly or view it on GitHub
#14 (comment) .
<https://github.com/notifications/beacon/6372841__eyJzY29wZSI6Ik5ld3NpZ
XM6QmVhY29uIiwiZXhwaXJlcyI6MTcwNTg1MzQ0MiwiZGF0YSI6eyJpZCI6MjM
4NzUxMzJ9fQ==--563e3cb04372a9399052350f9099b4e4aa70c2cf.gif>