patgrasso / tokensregex

JS implementation of the Stanford NLP TokensRegex parser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TokensRegex

TokensRegex is a tool for defining patterns over text/sequences of tokens, with an emphasis on the use of attributes of the text/tokens, such as part of speech or recognized entities. Instead of operating on individual characters, as ordinary regular expression systems do, TokensRegex operates on tokens. For example:

play ([{ tag:NN }]+) by ([]+)

This is a JavaScript implementation of the Stanford TokensRegex system. The expression grammar accepted by this implementation is adapted from that of Stanford's own TokensRegex system, a full description of which can be found at the link above.

Usage

const TokensRegex = require('tokensregex');

let songCommand = new TokensRegex('play ([{ tag:NN }]+) by ([]+)');

songCommand.test('play banana pancakes by Jack Johnson'); // true
songCommand.test('what is the capital of Minnesota?');    // false

'play Mr. Brightside by The Killers'.match(songCommand);  // { match object }

As you can see, TokensRegex operates very similarly to JavaScript's RegExp. TokensRegex extends RegExp and overrides all relevant functionality, so it can be used with the string methods match() and replace(), or via the RegExp exec() and test() methods.

Syntax

Each component in a TokensRegex expression operates on a token (typically a word, but can be any kind of token). Some operators, such as +, *, ?, may seem familiar.

Symbol Meaning
All
[] Any token
Strings
"abc" The text of the token exactly equals the string abc.
/abc/ The text of the token matches the regular expression specified by abc.
{ /key/:"/abc/" } The token annotation corresponding to key matches the string abc exactly.
{ /key/:/abc/ } The token annotation corresponding to key matches the regular expression specified by abc.
Numerics
{ key==number } The token annotation corresponding to key is equal to number.
{ key!=number } The token annotation corresponding to key is not equal to number.
{ key>number } The token annotation corresponding to key is greater than number.
{ key<number } The token annotation corresponding to key is less than number.
{ key>=number } The token annotation corresponding to key is greater than or equal to number.
{ key<=number } The token annotation corresponding to key is less than or equal to number.
Boolean checks
{ key::IS_NUM } The token annotation corresponding to key is a number.
{ key::IS_NIL } or { key::NOT_EXISTS } The token annotation corresponding to key does not exist.
{ key::NOT_NIL } or { key::EXISTS } The token annotation corresponding to key exist.
Sequencing
X Y X followed by Y
X | Y X or Y
X & Y X and Y
Groups
(X) X as a capturing group
(?$name X) X as a capturing group with name name
(?: X) X as a non-capturing group
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n times but no more than m times
Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n times but no more than m times

Rules with a strikethrough are not yet implemented.

License

MIT

About

JS implementation of the Stanford NLP TokensRegex parser

License:MIT License


Languages

Language:JavaScript 87.6%Language:Yacc 8.5%Language:Lex 3.9%