Improve k-means code. #helpwanted
jasonbaldridge opened this issue · comments
The current k-means implementation is something I did for homework assignments for teaching NLP courses at UT Austin. It can handle a fair amount, but it runs out of steam (in particular, memory) for larger datasets, especially if they have a lot of features. It currently uses dense vectors to represent the features for each data point, so it should be a fairly straightforward win to change this to use sparse vectors instead.
As is my (bad) habit, the K-means(++) impl in breeze is generic on vector
type, so can use SparseVectors.
-- David
On Tue, Apr 16, 2013 at 12:45 PM, Jason Baldridge
notifications@github.comwrote:
The current k-means implementation is something I did for homework
assignments for teaching NLP courses at UT Austin. It can handle a fair
amount, but it runs out of steam (in particular, memory) for larger
datasets, especially if they have a lot of features. It currently uses
dense vectors to represent the features for each data point, so it should
be a fairly straightforward win to change this to use sparse vectors
instead.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/10
.
Awesome. This may be sorted out directly as we transition things from Breeze then.