Additional Random Generator

Question

Additional Random Generator

smldub opened this issue 2 years ago · comments

Problem:
The sparse.random function is very speedy for most tensors below the size of around 1E9 elements, but after that point the two current methods run into some drawbacks:

random_state.choice requires enough memory to store an array of the same size as the number of elements in the tensor, and in my experience this takes quite some time for tensors larger than 1E9
The set/hashing method used when the density is below .3 is limited by being a python for loop and takes quite some time to generate any more than 1E6 random elements (which is below the .3 threshold of a 1E9 element tensor). I think this is likely the result of needing to refer to a hash table every loop.

Potential Solution:
1). Generate a list using np.random.randint that is nnz (the number of nonzero entries desired) long.
2). Then use np.unique to both sort and take out any repeats.
3). go back to step 1, but decrease the desired length to number of nnz elements still needed and np.hstack the resulting matrices.

np. unique also sorts the results making the construction of the COO matrix a little faster.

Example Code:

def rand (shape, nnz = None, density = .1):
    elements  = np.prod(shape)
    if nnz == None:
        nnz = int(elements*density)
    if nnz > elements: #my sad attempt at preventing errors
        print("fail")
        return None
    out = np.random.randint(elements, size = nnz) #generate the initial guess for indicies
    out = np.unique(out) #remove the repeated indices
    nnztemp = len(out)
    while nnztemp<nnz: #loop to get the rest of the elements
        out = np.hstack((out,np.random.randint(elements,size=int(nnz-nnztemp))
        out = np.unique(out)
        nnztemp = len(out)
    out = np.array(np.unravel_index(out,shape), dtype = np.int64) #converts index from 1d to Nd
    return sparse.COO(out, data = np.random.rand(out.shape[1]), shape = shape)

Here is a plot for a tensor with size 1E9 (100,100,100,1000) where I compare the speed of generating a random matrix with the default sparse function (in orange) vs the above function (in blue).

The new method also works relatively well at the small limit as well for the particularly large tensors because the first guess often has no repeated values.

I would be interested to hear any feedback on the idea before I try to implement something.

Hameer Abbasi · Answer 1 · Sat Mar 12 2022 02:43:31 GMT+0800 (China Standard Time)

random_state.choice requires enough memory to store an array of the same size as the number of elements in the tensor, and in my experience this takes quite some time for tensors larger than 1E9

I find this quite surprising, I would have imagined it takes O(nnz) time since it can use this algorithm.

Maybe you could experiment with Numba to implement it? 😉 https://numba.readthedocs.io/en/stable/reference/numpysupported.html#random

smldub · Answer 2 · Sat Mar 12 2022 06:06:41 GMT+0800 (China Standard Time)

Those definitely look interesting, but I wonder if they are operating with an unnecessary handicap because the size of resevoir is unknown, while it is known in our case? I'll test it against some other sample without replacement algorithms after I do a little research.

Hameer Abbasi · Answer 3 · Sat Mar 12 2022 13:41:38 GMT+0800 (China Standard Time)

What about Fisher-Yates, or a variation of it?

smldub · Answer 4 · Sat Mar 12 2022 14:11:08 GMT+0800 (China Standard Time)

After a little bit of research, I came across these two articles, which layout some interesting paths.
https://arxiv.org/abs/1610.05141
https://arxiv.org/abs/2104.05091
I've implemented the D algorithm (by Vitter), which is supposedly the slowest really fast algorithm according to the first article, but has the nice property that it is a constant time algorithm.

I've plotted the performance of it creating a sparse matrix vs the traditional sparse.random with the size of the tensor listed on top. (Algorithm D in blue, sparse.random in orange)

I think that the parallel algorithms listed in the first paper would be interesting, but I think the code required to do a simple random matrix would explode pretty fast.
The first paper claims the B algorithm
https://dl.acm.org/doi/pdf/10.1145/214392.214402
could be implemented faster than D even without parallelization, so that might be worth checking out too.

Here is the D algorithm if you don't want to type it out yourself.

import numpy as np
import numba
import sparse
@numba.jit(nopython=True, nogil = True)
def algD(n,N):
    n = int(n+1)
    N = int(N)
    j = -1
    qu1 = N-n+1
    negalphainv = -13
    threshold = -negalphainv * n

    nreal = np.float(n)
    Nreal = np.float(N)
    nmin1inv = 1.0 / (n - 1)
    Vprime = np.exp(np.log(np.random.rand())/n)
    qu1real = 1 - nreal + Nreal
    a = False; b = False;
    i=0
    arr = np.zeros(n-1)
    while n > 1:
        nmin1inv = 1/ (nreal-1)
        while a == False:
            while b == False:
                X = Nreal * (-Vprime + 1)
                S = np.floor(X)
                if S < qu1:
                    break
                Vprime = np.exp(np.log(np.random.rand())/n)
            U = np.random.rand()
            negSreal = -S
            y1 = np.exp(np.log(U*Nreal/qu1real)*nmin1inv)
            Vprime = y1 * (-X/Nreal + 1) * (qu1real/ (negSreal+qu1real))
            if Vprime <=1:
                break
            y2 = 1
            top = Nreal - 1
            if n-1 > S:
                bottom = Nreal-nreal
                limit = (-S+N)
            else:
                bottom = negSreal+Nreal-1
                limit = qu1

            t = N-1
            while t>= limit:
                y2 *=top/bottom
                top -=1
                bottom -=1
                t -=1
            if Nreal/(-X+Nreal) >= y1*np.exp(np.log(y2)/nmin1inv):
                Vprime = np.exp(np.log(np.random.rand())*nmin1inv)
                break
            Vprime = np.exp(np.log(np.random.rand())/n)
        j += S+1
        arr[i]=j
        i+=1
        N = -S+(N-1)
        Nreal = negSreal +(-1+Nreal)
        n-=1
        nreal-=1
        ninv = nmin1inv
        qu1 = -S +qu1
        qu1real = negSreal+qu1real
        threshold +=negalphainv
    return arr

Hameer Abbasi · Answer 5 · Sat Mar 12 2022 14:28:35 GMT+0800 (China Standard Time)

Please go ahead and add it! 😄 I still think we have some minor kinks to work out, but that can be done better on the PR with comments.

smldub · Answer 6 · Sun Mar 13 2022 14:06:15 GMT+0800 (China Standard Time)

I have uploaded the code I've gotten written so far to my branch, but pytest isn't happy with it yet. Should I push that to the main branch even though it will be full of errors? Sorry not really familiar with the whole github workflow.

Hameer Abbasi · Answer 7 · Sun Mar 13 2022 14:45:03 GMT+0800 (China Standard Time)

When you open a pull request, it isn't automatically merged. In fact, that's the best way to resolve errors together since we can see the changes and work on fixes.

Also the tests run on Continuous Integration, so I can see what they are.

Hameer Abbasi · Answer 8 · Sun Mar 13 2022 18:38:25 GMT+0800 (China Standard Time)

In addition, you can open a pull request from a branch other than main -- That's usually how it's done. Feel free to ask follow up questions, I'm happy to help.

Also, feel free to use Gitter for higher-frequency communication.