FlakyCloudy / smart-match

The smart-match module contains functions for calculating strings/sets similarity.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

The smart-match module contains functions for calculating strings/sets similarity.

Concept

  1. similarity: A value in a range of [0, 1], which represents how similar the two strings are. The larger the value, the more similar the two strings are.

  2. dissimilarity: A value in a range of [0, 1], which represents how dissimilar the two strings are. The larger the value, the more dissimilar the two strings are. For a pair of strings, similarity = 1 - dissimilarity

  3. distance: How far the two strings are. Notice that not all the methods support distance method.

  4. score The larger the score, the more similar the two strings are. Notice not all the methods have score method.

We support three levels of string matching.

  1. char: Similarity computation based on characters in the strings.

  2. term: Similarity computation based on terms in the strings.

  3. gram: Similarity computation based on q-grams in the strings.

Methods

We support the following methods.

Method similarity dissimilarity distance score
Levenshtein (default)
Euclidean
Damerau Levenshtein
Block Distance
Cosine
Tanimoto Coefficient
Dice
Simon White
Longest Common Substring
Longest Common SubSequence
Overlap Coefficient
Generalized Overlap Coefficient
Jaccard
Generalized Jaccard
Hamming
Jaro
Jaro Winkler
Needleman Wunch
Smith Waterman
Smith Waterman Gotoh
Monge Elkan

Installation

pip install smart-match

Usage

import smart_match
print(smart_match.similarity('hello', 'hero'))
print(smart_match.dissimilarity('hello', 'hero'))
print(smart_match.distance('hello', 'hero'))

Output:

0.6
0.4
2

Check Wiki for more details.

License

smart-match is a free software. See the file LICENSE for the full text.

Authors

qrcode_for_wechat_official_account

About

The smart-match module contains functions for calculating strings/sets similarity.

License:MIT License


Languages

Language:Python 100.0%