crfmc / similarity-measures

Python scripts used to calculate 3 basic similarity measures, suitable for ad hoc information retrieval systems: Levenshtein Edit Distance, Jaccard, and a Term-Document matrix.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

similarity-measures

This is a repository that contains 3 independent python scripts:

levenshtein.py

  • A program that uses a Numpy library to calculate the Levenshtein edit distance between two strings.

document_matrix.py

  • Calculates the document-term matrix from a collection of strings.

jaccard.py

  • Calculates the Jaccard similarity measure for two lists of strings.

About

Python scripts used to calculate 3 basic similarity measures, suitable for ad hoc information retrieval systems: Levenshtein Edit Distance, Jaccard, and a Term-Document matrix.


Languages

Language:Python 100.0%