dashbitco / nimble_parsec

A simple and fast library for text-based parser combinators

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is such a grammar supported?

davidarnold opened this issue · comments

I'm trying to implement a parsing rule for a strange feature of LISP 1.5:

There is a provision for reading in atomic symbols containing arbitrary characters.

This is done by punching the form $$dsd, where s is any string of up to 30 characters, and d is any character not contained in the string s. Only the string s is used in forming the print name of the atomic symbol; d and the dollar signs will not appear when the atomic symbol is printed out.

So as an example, $$XFOOX should yield FOO.

This can be implemented with a regular expression \$\$(\w)(?:(?!\1).){1,30}\1, but I don't see any backreference like construct in NimbleParsec. My best attempt is the following, but of course X is hard coded, so it doesn't work for the general case.

  escaped_atom =
    ignore(string("$$"))
    |> ignore(string("X"))
    |> times(lookahead_not(string("X")) |> ascii_char([?0..?9, ?A..?Z]), min: 1, max: 30)
    |> ignore(string("X"))
    |> reduce({:to_string, []})
    |> label("escaped_atom")

Is there something that I am missing or is this a class of grammar that NimbleParsec does not support?

Not officially but there is a hack you can do. You can use |> parsec(:foo) to say it will call the combinator defined as :foo but then you implement said combinator by hand. You can use the debug: true flag in defparsec so you can have a glance at how this combination function should look like and then add your own. The docs are here: https://hexdocs.pm/nimble_parsec/NimbleParsec.html#parsec/2

Perhaps we will support official ways of doing so in the future especially because in your case the implementation of the additional parsing is really straight-forward.

@davidarnold I just pushed a feature that allows this. See the tests in the commit above for an example!

@josevalim Thank you very much! It took me a bit to understand fully, but I got this push-back style working.

  bcd_range = [?\s, ?$, ?(, ?), ?*, ?+, ?,, ?-, ?., ?/, ?0..?9, ?=, ?A..?Z]

  defp scan_arbitrary_atom(rest, [character], context, _line, _offset) do
    case String.split(rest, character, parts: 2) do
      [inner, rest] -> {inner <> <<0>> <> rest, [], context}
      [_] -> {:error, "no matching #{character} character found"}
    end
  end

  arbitrary_atom =
    ignore(string("$$"))
    |> post_traverse(ascii_string(bcd_range, 1), :scan_arbitrary_atom)
    |> ascii_string(bcd_range, min: 1, max: 30)
    |> ignore(string(<<0>>))
    |> label("arbitrary atom")