jemc / crystal-pegmatite

A high-performance Parsing Expression Grammar (PEG) library for the Crystal language. :gem: :capital_abcd:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DSL: Allow builtin types to be used with (EDIT: a more concise wrapper)

watzon opened this issue · comments

I love this so far, just a couple things that would be nice to have. Currently ranges, characters, and strings require a wrapper (namely range, char, or string) in order to use them in a grammar definition, it would be nice if we could just use literals.

Also, have you put any thought into actually writing a grammar format so that grammars can be written as plain text files?

I love this so far

❤️

Also, have you put any thought into actually writing a grammar format so that grammars can be written as plain text files?

My personal preference is toward the current DSL that fits within Crystal as its host language. I like it because I can count on any features of Crystal being available and working as expected, giving me power to make abstractions if I need to.

That said, if you (or anyone else) has a strong preference toward building a separate language grammar that could be used for defining language grammars, I'm open to including such a feature in the library, as long as it doesn't replace or impair the existing in-Crystal DSL.

Currently ranges, characters, and strings require a wrapper (namely range, char, or string) in order to use them in a grammar definition, it would be nice if we could just use literals.

Hmmmm.. this would be kind of tricky to pull off cleanly while still living within the in-Crystal DSL. The issue is that the left hand type of any operator defines the implementation of the operator sugar method, which creates a problem for cases where the left hand side would be a raw string, range, or character literal.

That is, we could easily make this work:

foobar = s >> "foo" >> s >> "baz"

But this falls flat because the first object in the chain is of type String has no >> method:

foobar = "foo" >> s >> "bar" >> s

One way around this would be to introduce a "precursor" object that matches nothing and does nothing other than prepare to wrap the following literal.

foobar = precursor >> "foo" >> s >> "bar" >> s

But this feels inconsistent and potentially confusing, especially when you account for the fact that any parenthesized expression would also need the same precursor marker as the first item in the parenthesized expression.

foobar = precursor >> "foo" >> s >> (precursor | "bar" | "baz") >> s

Alternatively, we could monkey-patch String and Char and Range, but... let's not 😁

Overall, I'm not sure if there is a good approach for this, but I'm open to hearing ideas if there is something I've missed.

Yeah it seems like monkey patching could be the only alternative, which wouldn't be terrible if it could be patched just while inside the DSL. I don't know how possible that is though.

The only other approach I can think of that is worth mentioning would be to make the wrapper more concise.

For example, we could have a method called l that is a shorthand for "literal", that could take a string, char, or range, pattern-match on the type of it, and give you the correct object on the other side.

foobar = l("foo") >> s >> l("bar") >> s

Now that wouldn't be a bad idea. It would be nice to have a regex literal thrown in there too, but I can also see the reasons for not having that. It would make matches like this /[^\0\n\\]/ a little more concise though. Now if the regex literal were to generate its own parse tree...

Btw if you are on IRC or gitter I'd love to chat about this. I have some questions that don't really fit into an issue and would like to help with this eventually by writing some documentation. My username is the same for both.

It would be nice to have a regex literal thrown in there too, but I can also see the reasons for not having that

I'm fine with adding regular expression matching, as long as we don't have to support captures. That would muddy the waters too much.

Now if the regex literal were to generate its own parse tree...

😨

Yeah I don't see any specific reason for captures, but being able to match on a regex literal would be pretty nice

I've edited the title slightly to reflect this approach, and applied a "help wanted" label to signal that this is defined enough for someone to work on this.

Awesome. Maybe I'll take a swing at it at some point. More worried about #3 for now.