Is such a grammar supported?
davidarnold opened this issue · comments
I'm trying to implement a parsing rule for a strange feature of LISP 1.5:
There is a provision for reading in atomic symbols containing arbitrary characters.
This is done by punching the form
$$dsd
, wheres
is any string of up to 30 characters, andd
is any character not contained in the strings
. Only the strings
is used in forming the print name of the atomic symbol;d
and the dollar signs will not appear when the atomic symbol is printed out.
So as an example, $$XFOOX
should yield FOO
.
This can be implemented with a regular expression \$\$(\w)(?:(?!\1).){1,30}\1
, but I don't see any backreference like construct in NimbleParsec. My best attempt is the following, but of course X
is hard coded, so it doesn't work for the general case.
escaped_atom =
ignore(string("$$"))
|> ignore(string("X"))
|> times(lookahead_not(string("X")) |> ascii_char([?0..?9, ?A..?Z]), min: 1, max: 30)
|> ignore(string("X"))
|> reduce({:to_string, []})
|> label("escaped_atom")
Is there something that I am missing or is this a class of grammar that NimbleParsec does not support?
Not officially but there is a hack you can do. You can use |> parsec(:foo)
to say it will call the combinator defined as :foo
but then you implement said combinator by hand. You can use the debug: true
flag in defparsec
so you can have a glance at how this combination function should look like and then add your own. The docs are here: https://hexdocs.pm/nimble_parsec/NimbleParsec.html#parsec/2
Perhaps we will support official ways of doing so in the future especially because in your case the implementation of the additional parsing is really straight-forward.
@davidarnold I just pushed a feature that allows this. See the tests in the commit above for an example!
@josevalim Thank you very much! It took me a bit to understand fully, but I got this push-back style working.
bcd_range = [?\s, ?$, ?(, ?), ?*, ?+, ?,, ?-, ?., ?/, ?0..?9, ?=, ?A..?Z]
defp scan_arbitrary_atom(rest, [character], context, _line, _offset) do
case String.split(rest, character, parts: 2) do
[inner, rest] -> {inner <> <<0>> <> rest, [], context}
[_] -> {:error, "no matching #{character} character found"}
end
end
arbitrary_atom =
ignore(string("$$"))
|> post_traverse(ascii_string(bcd_range, 1), :scan_arbitrary_atom)
|> ascii_string(bcd_range, min: 1, max: 30)
|> ignore(string(<<0>>))
|> label("arbitrary atom")