sanjayaksaxena / wink-jaro-distance

An Implementation of Jaro Distance Algorithm by Matthew A. Jaro

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wink-jaro-distance

An Implementation of Jaro Distance Algorithm by Matthew A. Jaro

Build Status Coverage Status Inline docs devDependencies Status

De-duplicate short strings such as names by computing similarity and distance between a pair of strings using wink-jaro-distance. It is a part of wink — a growing family of high quality packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS.

It is an implementation of Jaro Distance Algorithm that determines the similarity/distance by taking into account the insertions, deletions and transpositions.

Installation

Use npm to install:

npm install wink-jaro-distance --save

Example Build Status

// Load Jaro Distance Function
var jaro = require( 'wink-jaro-distance' );

console.log( jaro( 'father', 'farther') );
// -> { distance: 0.04761904761904756, similarity: 0.9523809523809524 }

console.log( jaro( 'Angelina', 'Angelica') );
// -> { distance: 0.08333333333333337,  similarity: 0.9166666666666666 }

console.log( jaro( 'Flikr', 'Flicker' ) );
// -> { distance: 0.09523809523809523, similarity: 0.9047619047619048 }

console.log( jaro( 'abcdef', 'fedcba'  ) );
// -> { distance: 0.6111111111111112, similarity: 0.38888888888888884 }

API

jaro

Computes Jaro distance and similarity between strings s1 and s2.

Original Reference: UNIMATCH: A Record Linkage System: Users Manual pp 104.

Parameters

  • s1 string — the first string.
  • s2 string — the second string.

Examples

// returns { distance: 0.08333333333333337, similarity: 0.9166666666666666 }
jaro( 'daniel', 'danielle' );

Returns object containing distance and similarity values between 0 and 1.

Need Help?

If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.

Copyright & License

wink-jaro-distance is copyright 2017 GRAYPE Systems Private Limited.

It is licensed under the under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3 of the License.

About

An Implementation of Jaro Distance Algorithm by Matthew A. Jaro

License:GNU Affero General Public License v3.0


Languages

Language:JavaScript 100.0%