alecuba16 / mapreduce_minhash_lsh

A simple implementation of minHash LSH in hadoop mapreduce

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

minHash LSH

A simple implementation of the minHash algorithm in mapreduce. The output is a csv with the candidates that superates the jaccard distance threshold of 0.8.

About

A simple implementation of minHash LSH in hadoop mapreduce


Languages

Language:Java 100.0%