There are 12 repositories under data-matching topic.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
A powerful and modular toolkit for record linkage and duplicate detection in Python
A list of free data matching and record linkage software.
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Resources for tackling record linkage / deduplication / data matching problems
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
A browser user interface for manual labeling of record pairs.
Welcome to Snowman App – a Data Matching Benchmark Platform.
A maximum-strength name parser for record linkage.
https://medium.com/@carlosraphael/specification-design-pattern-in-java-8-bac6f5f943bc
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
A collection of awesome resources regarding Record Linkage.
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
Weka Comparator to match rules to test data with filtering abilites
Service for automatic matching two data sets without mapping
Undergraduate Final Project (needs README up to date!!) - Scientific paper soon to be included
Crawl, matching and explore data about jobs in Viet Nam.
This projects aims to provide lists containing only great movies to users based only a gew filters and search parameters.
AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuning
ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science
A Single View application aggregates and reconciles data from multiple sources to create a single view of an entity.
Repository for CS 838 (Spring 2017) Data Science project
Unstructured Record Linkage using Siamese Networks and Large Language Models (LLMs) such as LLAMA3 and ChatGPT-4o.