lixing-yang / MSMP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MSMP

This assignment is to examine the performance of a product duplicate detection method MSMP+ (Multi-component Simiarity Method with Pre-selection based on LSH) with a revised method for model words extraction.

main.py

In this file, we complete the main taks for this assignment. Data retriving and bootstrapping are completed, and the final performance of both LSH and MSM is obtained.

msmp.py

In this file, we define the necessary functions needed for MSMP. Functions for LSH, MSM, and the hierarchical clustering method are defined.

msm.py

In this file, necessary functions are defined for MSM.

dataprocessing.py

In this file, we define functions used for data processing including data cleaning and model words extraction.

About


Languages

Language:Python 100.0%