Improve k-means code. #helpwanted

Question

Improve k-means code. #helpwanted

jasonbaldridge opened this issue 11 years ago · comments

The current k-means implementation is something I did for homework assignments for teaching NLP courses at UT Austin. It can handle a fair amount, but it runs out of steam (in particular, memory) for larger datasets, especially if they have a lot of features. It currently uses dense vectors to represent the features for each data point, so it should be a fairly straightforward win to change this to use sparse vectors instead.

David Hall · Answer 1 · Wed Apr 17 2013 03:55:16 GMT+0800 (China Standard Time)

As is my (bad) habit, the K-means(++) impl in breeze is generic on vector
type, so can use SparseVectors.

-- David

On Tue, Apr 16, 2013 at 12:45 PM, Jason Baldridge
notifications@github.comwrote:

The current k-means implementation is something I did for homework
assignments for teaching NLP courses at UT Austin. It can handle a fair
amount, but it runs out of steam (in particular, memory) for larger
datasets, especially if they have a lot of features. It currently uses
dense vectors to represent the features for each data point, so it should
be a fairly straightforward win to change this to use sparse vectors
instead.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/10
.

Jason Baldridge · Answer 2 · Wed Apr 17 2013 03:57:18 GMT+0800 (China Standard Time)

Awesome. This may be sorted out directly as we transition things from Breeze then.