ajaykumarr123 / pyspark-adalsh

PySpark implementation of Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spark Adaptive LSH

Top-K Entity Resolution for Apache Spark. The algorithm is described in the paper "Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing" of Vasilis Verroios and Hector Garcia-Molina of Stanford University, available here. Some of code of Adaptive LSH is based on pyspark-lsh project, an implementation of the classic LSH tecnique.

About

PySpark implementation of Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing

License:GNU General Public License v3.0


Languages

Language:Python 100.0%