anhaidgroup / py_stringmatching

A comprehensive and scalable set of string tokenizers and similarity measures in Python

Home Page:https://sites.google.com/site/anhaidgroup/projects/py_stringmatching

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

py_stringmatching

This project seeks to build a Python software package that consists of a comprehensive and scalable set of string tokenizers (such as alphabetical tokenizers, whitespace tokenizers) and string similarity measures (such as edit distance, Jaccard, TF/IDF). The package is free, open-source, and BSD-licensed.

Important links

Dependencies

py_stringmatching has been tested on each Python version between 3.7 and 3.12, inclusive.

The required dependencies to build the package are NumPy 1.7.0 or higher and a C or C++ compiler. For the development version, you will also need Cython.

Platforms

py_stringmatching has been tested on Linux, OS X and Windows. At this time we have only tested on x86 architecture.

About

A comprehensive and scalable set of string tokenizers and similarity measures in Python

https://sites.google.com/site/anhaidgroup/projects/py_stringmatching

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 94.4%Language:Cython 3.4%Language:Batchfile 1.2%Language:PowerShell 1.0%