ZGorlock / FuzzyRegex

2012 - Fuzzy regex pattern matching algorithm with variable extraction.   *Deprecated* - Moved to Java-Commons

Home Page:https://github.com/ZGorlock/Java-Commons/blob/main/src/commons/object/string/StringComparisonUtility.java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FuzzyRegex

Fuzzy Regex Pattern Matching and Capturing


Table of Contents:


Examples

Standard

Match strings to patterns and extract variables even if the input text does not match the pattern exactly:

Pattern: "my name is ¿ and I am ¿ years old"

Input: "my name is John and I am 30 years old"
    Score: 1.0
    Variables: ["John", "30"]
    Tokens: ["my name is ", " and I am ", " years old"]

Input: "My names John and I'm 30 years old."
    Score: 0.8285714285714286
    Variables: ["John", "30"]
    Tokens: ["My names ", " and I'm ", " years old."]

Ambiguous

In ambiguous cases all valid extraction results are returned:

Pattern: "What ¿ ¿s"

Input: "What the hell are lobsters"
    Score: 1.0
    
    Extraction 1:
        Variables: ["the", "hell are lobster"]
        Tokens: ["What ", " ", "s"]
    
    Extraction 2:
        Variables: ["the hell are", "lobster"]
        Tokens: ["What ", " ", "s"]
    
    Extraction 3:
        Variables: ["the hell", "are lobster"]
        Tokens: ["What ", " ", "s"]

 


Usage

Methods

There are two methods available:

  • stringCompare() - Determines how closely an input string matches a pattern and returns a value between 0 and 1
  • stringEditDistance() - Determines how closely an input string matches a pattern and returns the number of edits required on the input string in order for it to match the pattern

Parameters

Both methods can take the same parameters:

PARAMETER TYPE DESCRIPTION
pattern String The pattern to compare against (the wildcard symbol is ¿)
text String The input string to compare to the pattern
vars List<List<String>> If included, this list will be populated with the extracted variables found during the comparison
tokens List<List<String>> If included, this list will be populated with the extracted tokens found during the comparison
ignoreCase boolean When true, the comparison is performed without case sensativity (false by default)
ignorePunctuation boolean When true, punctation is ignored during the comparison (false by default)

Overloads

Both methods have the same overloads using the parameters defined above:

pattern text vars tokens ignoreCase ignorePunctuation
X X
X X X
X X X X
X X X X
X X X X X
X X X X X X

 


About

2012 - Fuzzy regex pattern matching algorithm with variable extraction.   *Deprecated* - Moved to Java-Commons

https://github.com/ZGorlock/Java-Commons/blob/main/src/commons/object/string/StringComparisonUtility.java

License:GNU General Public License v3.0


Languages

Language:Java 100.0%