soenneker / soenneker.utils.string.jaccardsimilarity

A utility library for comparing strings via the Jaccard similarity algorithm

Home Page:https://soenneker.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Soenneker.Utils.String.JaccardSimilarity

A utility library for comparing strings via the Jaccard similarity algorithm

Installation

dotnet add package Soenneker.Utils.String.JaccardSimilarity

Why?

Jaccard similarity is great for comparing sets of items, and it's often used for tasks like detecting similar documents or recommending content. It's useful because:

Set-Focused:

It works well when you care about what elements are present, not their order.

Scale Doesn't Matter:

It's not influenced by how big the sets are, just by what they share.

Efficient:

It's quick to calculate making it suitable for large datasets.

Handles Noise Well:

It stays reliable even if there's extra, less important information in the sets.

Usage

var text1 = "This is a test";
var text2 = "This is another test";

double result = JaccardSimilarityStringUtil.CalculateSimilarityPercentage(text1, text2); // 60

About

A utility library for comparing strings via the Jaccard similarity algorithm

https://soenneker.com

License:MIT License


Languages

Language:C# 100.0%