easonchan1213/minhash

##Introduction

This is a prototype implementation of the MinHash technique for quickly estimating how similar two sets are.

##Requirements

##Extra Folders

##Build and Run

	$> sbt "run <path to input> <number of hash functions> <user to be recommended>"

For example:

    $> sbt "run input/sampleInput2.txt 4 2"

The output should give a List[(userId,SimIndex)] in descending order and a list of products to be recommended.

Note: for number of hash functions more than 5, it may need to run the algorithm several times until a recommendation is shown.

About

Clustering using MinHash technique

Language:Scala 53.7%Language:TeX 44.5%Language:Python 1.8%