comb
allows you to write parsers in a tagged template literal which can call JavaScript functions that process your tree as you generate it. It's built using parser combinators. The comb
language parser itself uses the same underlying parser combinator functions which are used to parse languages whose grammers are written in comb
. Yeah for recursion!
comb
is <4kb minified.
Here is a comb
program for some simple arithmetic. The program is parsed and evaluated in a single pass.
import { comb } from "./comb.js";
const skip = ["whitespace"];
const rules = {
// literals
"(": "(",
")": ")",
"-": "-",
"+": "+",
"/": "/",
"*": "*",
"^": "^",
// can take regex
number: /[0-9]+\.?[0-9]*/,
whitespace: /[^\S\r\n]+/,
// can also use array as value, will match any element
// ops: ["-", "+", "*", "/", "^"]
}
const parse = comb`
lexer ${ { rules, skip } }
number = 'number'
number -> ${x => Number(x.value)}
neg = '-' 'number'
neg -> ${x => -Number(x[1].value)}
op = '+' | '-' | '*' | '/' | '^'
op -> ${x => x.value}
paren = '(' ( exp | paren | neg | number ) ')'
paren -> ${x => x[1]}
binary = ( number | neg | paren ) op ( binary | paren | neg | number )
binary -> ${x => applyPrecedence(x)}
exp = binary | paren | neg | number
exp -> ${x => evalResult(x)}
exp
`
const funcs = {
"*": (x, y) => x*y,
"/": (x, y) => x/y,
"+": (x, y) => x+y,
"-": (x, y) => x-y,
"^": (x, y) => x**y
}
const operators = [
["+", "-"],
["*", "/"],
["^"],
].reduce( (acc, cur, i) => {
cur.forEach(op => acc[op] = i);
return acc;
}, {});
const getPrecedence = (op) => operators[op];
const applyPrecedence = exp => {
if (!Array.isArray(exp)) return exp;
const [ first, op, second ] = exp;
return (Array.isArray(second) && getPrecedence(op) >= getPrecedence(second[1]))
? [
applyPrecedence([ first, op, second[0] ]),
second[1],
second[2]
]
: exp;
}
const evalResult = (node) => {
if (typeof node === "number") return node;
else {
const [ left, op, right ] = node;
return funcs[op](evalResult(left), evalResult(right));
}
};
const result = parse("2^2 * (3 - 1) - 2^2");
console.log(result); // 4
Let's take a closer look at how this program works.
The lexer can take rules for tokenizing the program and an array of what tokens to skip.
These rules can be String
, RegEx
, or Array
.
You can also pass a function to lexer which takes a string and returns an array of lexemes/tokens.
Rules are defined with an =
. Adjacent terms are "and" and |
is "or".
There are three modifiers:
*
is zero or more+
is one or more?
is optional
You can use (
and )
to adjust precedence.
Transformers ->
are JavaScript functions that will receive an array of each element in the term. Each element will be an object with:
{
type,
value,
index
}
The return value of the transform will be returned when parsed instead of the default parsed syntax tree.
The last line in the comb
program is returned.