Harivallabha / Plagiarism-Checker-using-Locality-Sensitive-Hashing

Locality Sensitive Hashing for evaluating similarity scores. Various Distance measures have been imlemented, namely Jaccard, Cosine, Euclidean, Manhattan, Hamming.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Plagiarism-Checker-using-Locality-Sensitive-Hashing

Locality Sensitive Hashing for evaluating similarity scores. Various Distance measures have been imlemented, namely Jaccard, Cosine, Euclidean, Manhattan, Hamming. Each Distance Measure has been implemented in a separate class and each of these classes has a main function and hence can be executed independently. Developement was done using IntelliJ IDE(JetBrains).

If execution is done from command line, use the usual command for .java files:

eg: javac FileName.java java FileName

The doc1 and doc[i] lists house the shingles generated from the txt file path given for the query and dataset docs respectively. These are located in textFilesMinHash function of every class and can be changed as per user needs.

About

Locality Sensitive Hashing for evaluating similarity scores. Various Distance measures have been imlemented, namely Jaccard, Cosine, Euclidean, Manhattan, Hamming.