ALShum / MinHashLSH

Java implementation for MinHash and LSH for finding near duplicate documents as measured by Jaccard similarity.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MinHashLSH

Java implementation for MinHash and LSH for finding near duplicate documents as measured by Jaccard similarity.

Implementation of MinHash for approximating Jaccard similarity in text documents.
Also includes an implementation of LSH which is a fast way to find approximate nearest neighbors.

About

Java implementation for MinHash and LSH for finding near duplicate documents as measured by Jaccard similarity.


Languages

Language:Java 100.0%