wjbmattingly / spacyex

SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

spaCyEx

PyPI version GitHub stars GitHub forks

logo

spaCyEx is a powerful extension for spaCy, designed to make pattern matching as flexible and easy as using regular expressions. It builds upon the existing capabilities of spaCy's Matcher, enhancing it with a more accessible syntax for defining complex patterns. spaCyEx allows for intuitive and detailed text pattern specifications, perfect for extracting detailed linguistic features from texts.

Installation

You can install spaCyEx via pip:

pip install spacyex

Features

  • Dynamic Pattern Creation: Create complex token matching patterns using a simple string-based syntax.
  • Integration with spaCy: Leverage spaCy's Matcher capabilities to find sequences in text that match defined patterns.
  • Customizable Matching Rules: Define token attributes including text characteristics, lexical attributes, and grammatical properties.

Creating Patterns

Define patterns using a string syntax where each token and its attributes are encapsulated by parentheses. Token attributes are specified by key-value pairs, separated by an equals sign (=), and multiple attributes are divided by a pipe (|).

Syntax Examples

  • Single Attribute: (pos=NOUN)
  • Multiple Attributes: (pos=NOUN|lemma=run)
  • Using List Values: (lemma=in[run,walk])
  • Using Operators: (ent_type=person|op={2,3})

Pattern Matching

Once a pattern is defined, it can be used to search text for matches.

Usage

Here is a simple example to get started with spaCyEx:

import spacyex as se
import spacy

nlp = spacy.load("en_core_web_sm")
text = "John Smith runs fast, but Jacob Smith walks slowly."
pattern = "(ent_type=person|op={2}) (lemma=in[run,walk]) (pos=ADV)"

results = se.search(pattern, text, nlp)
for match in results:
    print(match[0].text, "Start:", match[1], "End:", match[2])

This code will match sequences in the text based on the defined pattern, using named entities, lemmas, and parts of speech.

Roadmap

  • Support for all dictionary properties in patterns.
  • Additional utilities and helper functions for more complex pattern scenarios.

About

SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.


Languages

Language:Python 58.6%Language:Jupyter Notebook 41.4%