jeroenjanssens / scikit-sos

A Python implementation of the Stochastic Outlier Selection algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues with large data sets

Suparno89-zz opened this issue · comments

Just tried this code with a data set with ~25000 rows. It just hangs on this line

D = np.sqrt(np.add(np.add(-2 * np.dot(X, X.T), sumX).T, sumX))

and throws a "FloatingPointError: invalid value encountered in sqrt". I tried to use it with the first 1000 rows but now it stopped at the line

H = np.log(sumA) + beta * np.sum(D * A) / sumA

with "divided by zero" runtime exception.

I tried it with even smaller data sets (~50 rows) and it worked like a charm. Don't know why this issue is there for large data sets . I hope this will be fixed in the later releases.

I wonder whether this is due to the size of the dataset, or just because of the values themselves. This might have been fixed by #5. It's been a while since you opened this issue, so I'm going to close it. If this problem still exists, we can open a new issue. Cheers.

Still the same problem ... It havent been resolved pls check into that

Thank You