TokensRegex
TokensRegex is a tool for defining patterns over text/sequences of tokens, with an emphasis on the use of attributes of the text/tokens, such as part of speech or recognized entities. Instead of operating on individual characters, as ordinary regular expression systems do, TokensRegex operates on tokens. For example:
play ([{ tag:NN }]+) by ([]+)
This is a JavaScript implementation of the Stanford TokensRegex system. The expression grammar accepted by this implementation is adapted from that of Stanford's own TokensRegex system, a full description of which can be found at the link above.
Usage
const TokensRegex = require('tokensregex');
let songCommand = new TokensRegex('play ([{ tag:NN }]+) by ([]+)');
songCommand.test('play banana pancakes by Jack Johnson'); // true
songCommand.test('what is the capital of Minnesota?'); // false
'play Mr. Brightside by The Killers'.match(songCommand); // { match object }
As you can see, TokensRegex
operates very similarly to JavaScript's RegExp
.
TokensRegex
extends RegExp
and overrides all relevant functionality, so it
can be used with the string methods match()
and replace()
, or via the
RegExp exec()
and test()
methods.
Syntax
Each component in a TokensRegex expression operates on a token (typically a
word, but can be any kind of token). Some operators, such as +
, *
, ?
, may
seem familiar.
Symbol | Meaning |
---|---|
All | |
[] | Any token |
Strings | |
The text of the token exactly equals the string abc. | |
/abc/ | The text of the token matches the regular expression specified by abc. |
{ /key/:"/abc/" } | The token annotation corresponding to key matches the string abc exactly. |
{ /key/:/abc/ } | The token annotation corresponding to key matches the regular expression specified by abc. |
Numerics | |
The token annotation corresponding to key is equal to number. | |
The token annotation corresponding to key is not equal to number. | |
The token annotation corresponding to key is greater than number. | |
The token annotation corresponding to key is less than number. | |
The token annotation corresponding to key is greater than or equal to number. | |
The token annotation corresponding to key is less than or equal to number. | |
Boolean checks | |
The token annotation corresponding to key is a number. | |
The token annotation corresponding to key does not exist. | |
The token annotation corresponding to key exist. | |
Sequencing | |
X Y | X followed by Y |
X | Y | X or Y |
X & Y | X and Y |
Groups | |
(X) | X as a capturing group |
X as a capturing group with name name | |
X as a non-capturing group | |
Greedy quantifiers | |
X? | X, once or not at all |
X* | X, zero or more times |
X+ | X, one or more times |
X, exactly n times | |
X, at least n times | |
X, at least n times but no more than m times | |
Reluctant quantifiers | |
X, once or not at all | |
X, zero or more times | |
X, one or more times | |
X, exactly n times | |
X, at least n times | |
X, at least n times but no more than m times |
Rules with a strikethrough are not yet implemented.
License
MIT