austrian-code-wizard / duplicateDetector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicate Detection

Efficient implementation to compute pairs with the lowest levenshtein distance in a list of excel data

How to install:

Install git

Clone repository: git clone https://github.com/austrian-code-wizard/duplicateDetector

Alternatively use the GitHub web GUI to clone the repository

Move into repository: cd duplicateDetector

Make sure you have python 3.7 installed.

Install virtualenv: python3 -m pip install virtualenv

Create venv: python3 -m virtualenv venv

Activate venv: . venv/bin/activate

Install repository: python setup.py install

Run example (make sure you change it to a valid excel file path and fields): python example.py

About


Languages

Language:Python 80.4%Language:JavaScript 13.6%Language:CSS 3.7%Language:HTML 2.4%