Issues with large data sets

Question

Issues with large data sets

Suparno89-zz opened this issue 10 years ago · comments

Just tried this code with a data set with ~25000 rows. It just hangs on this line

D = np.sqrt(np.add(np.add(-2 * np.dot(X, X.T), sumX).T, sumX))

and throws a "FloatingPointError: invalid value encountered in sqrt". I tried to use it with the first 1000 rows but now it stopped at the line

H = np.log(sumA) + beta * np.sum(D * A) / sumA

with "divided by zero" runtime exception.

I tried it with even smaller data sets (~50 rows) and it worked like a charm. Don't know why this issue is there for large data sets . I hope this will be fixed in the later releases.

Jeroen Janssens · Answer 1 · Thu Sep 22 2016 14:39:22 GMT+0800 (China Standard Time)

I wonder whether this is due to the size of the dataset, or just because of the values themselves. This might have been fixed by #5. It's been a while since you opened this issue, so I'm going to close it. If this problem still exists, we can open a new issue. Cheers.

NakulSai Adapala · Answer 2 · Mon Apr 30 2018 17:59:12 GMT+0800 (China Standard Time)

Still the same problem ... It havent been resolved pls check into that

Thank You