thejefflarson / fast-cluster

fast-cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

* fast_cluster *

Cluster documents using LSH in linear time.

$ make
$ ./fast_cluster <file> <ngrams>

example:
$ find path_to_documents | xargs -I{} ./fast_cluster {} 5 | cut -f1 | sort -n | uniq -c
... list of count id pairs ...
$ find path_to_documents | grep '<interesting_id>'

About

fast-cluster


Languages

Language:C++ 80.0%Language:C 20.0%