leostera / caramel

:candy: a functional language for building type-safe, scalable, and maintainable applications

Home Page:https://caramel.run

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

parsing erlang terms

progman1 opened this issue ยท comments

I run Erlang.Parse.from_file on
https://github.com/erlang/otp/blob/master/lib/wx/api_gen/wxapi.conf

and get the error

failed: In wxapi.conf.copy, at offset 820: syntax error.
  Erlang__Erl_parser.MenhirBasics.Error

probably because the file defines terms to be read by file:consult/1
and is not appropriate to the front door of your parser.
but with a different entry point it could parse terms?

Could you show me the file you're trying to parse?

Or an equivalent file that also breaks like this?

That'd help me see if there's anything that I know is currently unsupported by the Menhir parser or if we need to spend some time digging.

Thanks for opening the issue! ๐Ÿ™Œ๐Ÿผ

the link to it is above but here/s an excerpt:

%% %CopyrightEnd%

{const_skip, [wxGenericFindReplaceDialog, wxInvalidDateTime, wxLANGUAGE_KHMER]}.
{not_const,
 [wxRETAINED,
  %% New enums needed for gl contexts not static numbers
  {'wx_GL_COMPAT_PROFILE',   {test_if, "wxCHECK_VERSION(3,1,0)"}},
]}.

Oh, sorry, I missed the link.

The parser I think will have trouble parsing that since its built to parse an entire Erlang module. I started the tree-sitter-erlang project to address some of these limitations, but I haven't yet integrated it into the erlang library.

You could try using that tree-sitter parser with something like ocaml-tree-sitter to get up and running. Else I'd be happy to either help you integrate the tree-sitter-erlang into the erlang library or rework the Menhir parser as we just landed a new AST here that is waiting to be used.

I don't fully understand!
Terms are part of the erlang language aren't they?
What's the newest erl-parsetree.ml have on the old?
I saw that the parser as-is had just the one entry point (very reasonably :).
And I imagined that another entry point into the grammar could be added,
one directly to a 'Terms' rule.
Which may not be true if 'Term' syntax is not part of the erlang language itself....

You have the incremental parser menhir defnition - how come you're going
after tree-sitter?

FYI, on staring at the format of the wxapi.conf for a while I got the impression it
may not be a very regular syntax - a sort of lists of lists of lists affair that's ok for
erlangs dynamic typing approach. Which suggested to me that I maybe shouldn't start hacking a yacc grammar for it! It also suggests to me that it isn't part of the erlang language as such since you already have a menhir grammar for erlang. I can't remember the limitations of LALR/LR grammars unfortunately.

What's your understanding?
thanks.

@progman1 let me try to answer your questions :)

Terms are part of the erlang language aren't they?

Yes, they are.

And I imagined that another entry point into the grammar could be added,
one directly to a 'Terms' rule.

We could make a new parser that reuses the expression language from the main parser, yes. This is because Menhir allows only one %start entrypoint.

how come you're going after tree-sitter?

The Menhir parser is only directly usable within OCaml code, the Tree-sitter parser can be used anywhere with tree-sitter bindings. This is Rust libraries, neovim, github Semantic. The Erlang community benefits more widely from this.


The lowest hanging fruit here would be to refactor erl_parser.mly into 2 parsers: erl_expr_parser.mly and erl_mod_parser.mly. Caramel continues then to rely on the Erlang.Parser.module_from_file/1 and you get a new Erlang.Parser.terms_from_file/1 that you can use to lift your config file into an Erlang.Ast.literal list.

The strong path forward is to do some work and integrate tree-sitter-erlang back into this repository, to use that as the term parser first. If that works, it'll be easier to start migrating the main parser to it.

thanks for clarifying.
I will tackle the low-hanging fruit! I have done some messing with menhir and something
might be doable about entry points via converting to ocamlyacc grammar first, for an even lower hang!

I have a parsed file :)
happily, menhir does actually accept more than one start symbol.
I had to do dangling commas in tuples and lists - maybe that isn't valid expression language after all? (I don't know if 'term' language is any different to expressions)
the file also had multi-line strings which I took to mean should be stuck back together
(macro stringification?) so a change there too.

if these are actually valid erlang then I'm happy to send up the patch?

Well I stand corrected! ๐Ÿ™Œ๐Ÿผ I didn't know that, thanks for showing me. Please send a patch ๐ŸŽ‰ we can discuss the changes on the PR.