Dedupe.io (dedupeio)

Dedupe.io

dedupeio

Geek Repo

De-duplicate and find matches in your Excel spreadsheet or database

Location:Chicago. IL

Home Page:https://dedupe.io/

Github PK Tool:Github PK Tool

Dedupe.io's repositories

dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Language:PythonLicense:MITStargazers:3971Issues:119Issues:805

dedupe-examples

:id: Examples for using the dedupe library

Language:PythonLicense:MITStargazers:394Issues:26Issues:94

address-matching

Python script for matching a list of messy addresses against a gazetteer using dedupe.

affinegap

:triangular_ruler: A Cython implementation of the affine gap string distance

Language:CythonLicense:MITStargazers:58Issues:5Issues:9

hcluster

Hierarchical Clustering Algorithms

Language:PythonLicense:NOASSERTIONStargazers:34Issues:7Issues:14

pyhacrf

:triangular_ruler: Hidden alignment conditional random field for classifying string pairs.

Language:PythonLicense:BSD-3-ClauseStargazers:24Issues:5Issues:9

pylbfgs

:mountain_cableway: Python/Cython wrapper for liblbfgs

Language:CLicense:MITStargazers:24Issues:5Issues:9

dedupe-geocoder

:round_pushpin: Demonstration of how dedupe might be used as geocoder

Language:PythonLicense:MITStargazers:17Issues:7Issues:3

doublemetaphone

:sound: Python wrapper for a C++ Double Metaphone

Language:C++License:Artistic-2.0Stargazers:14Issues:3Issues:5

fuzzycategory

:triangular_ruler: Fuzzy Categorical Distances

Language:PythonLicense:MITStargazers:14Issues:6Issues:0

rlr

Regularized Logistic Regression

Language:PythonLicense:NOASSERTIONStargazers:11Issues:5Issues:6

dedupe-variable-person

Dedupe variable for person names. just people. no companies.

dedupe-variable-address

Address Variable Type for dedupe

dedupe-variable-name

name variable type for dedupe

dedupeio-web-api-docs

Dedupe.io web API allows for matching and training against projects using a standard RESTful framework.

highered

CRF Edit Distance

Levenshtein_search

Python search module for fast approximate string matching

Language:CLicense:MITStargazers:6Issues:4Issues:0

soft-tfidf

Mispelling tolerant tf-idf similarity metric

categorical-distance

:triangular_ruler: Compare categorical variables

dedupe-variable-datetime

DateTime variable for dedupe

dedupe-variable-fuzzycategory

Dedupe Variable for Fuzzy Categories

dedupe-vowpal

Vowpal Wabbit Active Labeler for Dedupe

Language:PythonLicense:MITStargazers:4Issues:4Issues:0

learned-string-alignments

Learning String Alignments for Entity Aliases

Language:PythonLicense:Apache-2.0Stargazers:4Issues:4Issues:0

datetime-distance

 📐 Compare dates and times

Language:PythonLicense:MITStargazers:3Issues:4Issues:2

dedupe-variable-number

Try to cast strings to numbers, then compare

parseratorvariable

Base class for dedupe variables for parsed fields

simplecosine

:triangular_ruler: simple cosine distance

dedupe-variable-ilcs

Dedupe variable for Illinois Compiled Statute (ILCS) codes

Language:PythonLicense:MITStargazers:2Issues:3Issues:2

fastcluster

Fast hierarchical clustering routines for R and Python.

Language:C++License:NOASSERTIONStargazers:1Issues:2Issues:0