This library is for creating and composing parsers.
For example:
local grammar = require("leftry")
local factor = grammar.factor
local span = grammar.span
# Declaring a Non-Terminal, "A"
local A = factor("A", function(A) return
span(A, "1"), # 1st alternative, A := A "1"
"1" # 2nd alternative, A := "1"
end)
# Declaring a Non-Terminal, "B"
local B = factor("B", function(B) return
span(B, "2"), # 1st alternative, B := B "2"
A # 2nd alternative, B := A
end)
# Using the composed parser.
# The first argument is the input string.
# The second argument is the string index to start from.
print(B("111122222", 1))
This creates a parser B
that can parse the string "111122222".
The purpose of the anonymous function in declaration of the non-terminal enables self-reference and referencing other non-terminals that are not fully initialized yet.
luarocks install --server=http://luarocks.org/dev leftry
First trace with a left-factored version of the grammar, then apply the left-recursive grammar. Like how a human would intuitively do it.
lua test.lua
- A parser has the following function signature.
rest, values = parser(invariant, position, [peek])
rest
is the next index to parse.rest-1
is the last index of the parsed value. Ifrest
isnil
, it means the parse is invalid.values
is the data created as a result of a successful parse.invariant
is the Lua string that is being parsed.position
is the integer index to start the parse from. It must be 1 or greater.peek
(optional). If true, validates only, andvalues
will benil
. Validating without creating the data is several times faster.
-
As iterator:
The function signature of a parser allows it to be used as a Lua iterator in a for-loop.
local actual = {} for rest, values in span("1", "2", "3"), "123123123", 1 do table.insert(actual, rest) end -- actual == {4, 7, 10}
This can be useful, for example in a programming language parser, to iterate through each parsed statement.
-
Composition:
Parsers can be nested. Left recursion is allowed.
local A = factor("A", function(A) return span(A, "1"), "1" end) local B = factor("B", function(B) return span(B, "2"), A end) local src = "11112222" assert(B(src, 1, true) == #src + 1)
-
Data initialization:
You can customise how the data is generated from a parse.
local A = factor("A", function(A) return span(A, "1") % function(initial, value) return (initial or "") .. value end, "1" end) local src = "111" A(src, 1) -- returns #src + 1, "111"
Leftry can handle some examples of left recursion.
-
Simple left recursion:
local A = factor("A", function(A) return span(A, "1"), "1" end)
-
Nested left recursion:
local A = factor("A", function(A) return span(A, "1"), "1" end) local B = factor("B", function(B) return span(B, "2"), A end)
Performance of the built-in Lua parser.
- Macbook Pro 2.6 Ghz i7 with 16 GB RAM:
- Lua:
- Validate the grammar of around 0.4 megabytes of Lua per second.
- Parse 0.1 megabytes of Lua into abstract syntax tree representation per second.
- LuaJIT:
- Validate the grammar of around 4 megabytes of Lua per second.
- Parse 0.68 megabytes of Lua into abstract syntax tree representation per second.
- For comparison:
- Lua interpreter can load a 15 megabyte Lua function in one second.
- LuaJIT can load a 25 megabyte Lua function in one second.
- Lua:
-
factor(name, generate, [initializer])
Create a non-terminal element.
name
is the tostring value of the element.generate
is the function that, when called with this element, returns the definition of this non-terminal. The values returned with this function will be wrapped in anany
. You may optionally, explicitly return a singleany
that contains all the alternatives. Strings literals are automatically converted intoterm
elements.initializer
is the function that will be called with values parsed from this element to let the user convert the parsed value into something useful. See "Data Constructors" section.
Usage:
local A = factor("A", function(A) return span(A, "1"), "1" end)
-
rep(element, [reducer])
Create a repeating element. It can be used only in an
any
or aspan
.element
is the element that can appear 0 or more times (if this element is in aspan
), or 1 or more times (if this element is in anany
). Strings literals are automatically converted intoterm
elements.reducer
is the function that will be called with values parsed from this element to let the user convert the parsed values into something useful. See "Data Constructors" section.
Usage:
span( "1", rep("2", function(a, b) return (a or "")..b end), "3")
-
opt(element)
Create an optional element. It can be used only in a
span
.element
is the element that can appear 0 or 1 times. Strings literals are automatically converted intoterm
elements.
Usage:
span("1", opt("2"), "3")
-
any(...)
Create an any element. The any element contains a set of alternatives, that will be attempted from left to right.
The
any
element is used internally to wrap the alternatives returned by thefactor
"generate" function.You do not need to worry about using
any
. -
span(...)
Create an span element. The span element can be assigned a reducer with the
%
operator. See the "Data Constructors" section....
the children of the span element. Each child must be encountered in order to provide a valid parse, unless it is anopt
orrep
element.
Usage:
local A = factor("A", function(A) return span(A, "1") % function(initial, value) return (initial or "") .. value end, "1" end)
The span element can also be assigned a spacing rule using the
^
operator:local function span(...) -- Apply spacing rule to all spans we use in the Lua grammar. return grammar.span(...) ^ {spacing=spacing, spaces=" \t\r\n"} end
See built-in Lua parser for an example on what spacing function looks like.
-
term(literal, [initializer])
Create a literal element.
literal
the string literal that must be encountered to provide a valid parse.initializer
is the function that will be called with the literal whenever there is a valid parse.
Usage:
term("hello")
-
factor
function( value, -- The parsed value to transform. self, -- The element responsible for parsing being transformed. position, -- The index where `value` was found. rest, -- The next index after `value` ends. choice) -- The index of the alternative `value` was parsed from. -- Default implementation return value end
-
term
function( value, -- The parsed value to transform. (The literal.) self, -- The element responsible for parsing being transformed. position, -- The index where `value` was found. rest) -- The next index after `value` ends. -- Default implementation return value end
Reducers are functions that will be folded over each value that will be parsed.
-
span
function( accumulated, -- The accumulated value. In the first iteration it is `nil`. value, -- The current value that is parsed. self, -- The element parsing the current value. position, -- The index where the current value begins. rest, -- The next index after `value` ends. i) -- `value` belongs to the `i`th element of this `span` element. -- Default implementation return rawset(initial or {}, i, value) end
-
rep
function( accumulated, -- The accumulated value. In the first iteration it is `nil`. value, -- The current value that is parsed. self, -- The element parsing the current value. position, -- The index where the current value begins. rest, -- The next index after `value` ends. i) -- The `i`th time the child element has been encountered. -- Default implementation return rawset(initial or {}, i, value) end
The current implementation does not enforce the following rules properly.
-
A
span
must have more than one child. -
In a left recursion alternative, only the first element may be the left-recurring non-terminal. More than one consecutive left-recurring non-terminal is not supported, even if it currently works.
-- This is OK local A = factor("A", function(A) return span(A, "1", A, "2"), "1" end) -- This is not OK local A = factor("A", function(A) return span(A, A, "1", "2"), "1" end)
-
An
any
element must not have anyopt
children. -
A
rep
element that is a child of anany
element requires 1 or more elements to match. -
A
rep
element that is a child of anspan
element requires 0 or more elements to match. -
The first nonterminal element of a span that is part of a left recursion path, cannot be wrapped in
opt
orrep
.
-
Implement Lua grammar in Leftry to prove it can handle the grammar of a programming language. -
Implement whitespacing support. -
Add appropriate data initializers for the builtin Lua parser, to override the default ones. -
Test the following grammar:
"b" | s ~ s ^^ { _ + _ } | s ~ s ~ s ^^ { _ + _ + _ } https://github.com/djspiewak/gll-combinators