validity-lib

A fun little library for determining duplicate contacts. There are both normal and advanced datasets for you to play with in the /data directory.

Installation

Prerequisites

Ensure you have the current version of Node.js or the latest LTS version. This was tested on version 10.11.

Simply clone or fork the repository then run:

npm install

in the root of the repository.

npm start

NOTE: main.js in the root directory is the entry-point. This is so ES6 node works properly (thank you esm!)

npm test

'nuff said.

This algorithm is NOT suited for non-business related contact de-duplication. It makes assumptions based on the data structure that the contacts won't be people living at the same address with different first names that share a phone number and email address (e.g. older married couples, or families with land-lines).
Further work could be done on data cleaning, testing, validation, and edge case checking
Performance will definitely start to degrade if you have more than 3k - 4k nameAddress keys and you start to have lots of obscure, and kind of similar names. This is really due to the distance function. Frankly optimizing the search around alphabetical order may help resolve that.
the nameAddress keys should likely be generated up-front at parsing time by the contact itself. That would be a good/simple refactor which would make things easier to read and faster to execute.
Metaphone 3 is likely a better/faster way of handling the names, however it's ~$260 for a license right now
The duplicateContactProvider could use some additional CQS work to help clean up the logic