Started working on ANTLR

Question

Started working on ANTLR

MoriTanosuke opened this issue 11 years ago · comments

Hey, I just started to work on an ANTLR version of jtoml, mostly for educational purposes to learn ANTLR. It's my first hands-on with a grammar, so I expect to make a lot of dumb mistakes, but maybe you're interested in my current code: https://github.com/MoriTanosuke/jtoml/tree/antlr

I try to push the branch to github whenever I have a major change in the grammar or the application code.

Alexandre Grison commented 11 years ago

Done

Alexandre Grison · Answer 1 · Mon Feb 25 2013 19:09:54 GMT+0800 (China Standard Time)

Hi,

I've only worked one time with ANTLR so my experience is limited. Besides it was not a full grammar but expressions with tokens.

For now the current parser works but has limitations for sure, so in the end it would probably be best if we can come up with a grammar for TOML. It could also serve for other languages.

I will take a look at your code this evening.
Thank you

Moandji Ezana · Answer 2 · Mon Feb 25 2013 20:48:19 GMT+0800 (China Standard Time)

I've also started working on a parser, but using Parboiled, which is simpler and leaner (in terms of dependencies). You can see it here: https://github.com/mwanji/jtoml/blob/peg_parser/src/main/java/me/grison/jtoml/TomlPegParser.java

Alexandre Grison · Answer 3 · Mon Feb 25 2013 22:09:41 GMT+0800 (China Standard Time)

@mwanji Wow, I've already seen Parboiled one time in the past but did not remember it. That's cool.

I'm thinking about introducing a Strategy which will let the user use whatever parser he wants (ANTLR, Parboiled or the default one), depending on the dependencies he'd like to have.

Moandji Ezana · Answer 4 · Mon Feb 25 2013 22:14:02 GMT+0800 (China Standard Time)

Interesting. Would all parsing-related dependencies be in scope provided?

I'm not sure I see a much advantage, compared to the loss of ease of use
and increased maintenance burden.

Alexandre Grison · Answer 5 · Mon Feb 25 2013 23:15:39 GMT+0800 (China Standard Time)

I imagine it could be an advantage for big files or for performance reasons to change the parser.

It would need benchmarks to see which one's the quickest, but let's imagine the ANTLR parser outperforms both the Parboiled one and the builtin one, the user could change the parser dynamically if he knows he's going to load a big file.

Besides if the maven dependencies are marked as provided, a user having already Parboiled in his dependencies would surely want to use the Parboiled parser if he don't really need ANTLR. The same apply for a user already using ANTLR.

I cannot find another good reason except those two.

He who can do more can do less, don't you think?

Moandji Ezana · Answer 6 · Tue Feb 26 2013 00:53:18 GMT+0800 (China Standard Time)

I imagine it could be an advantage for big files or for performance reasons to change the parser.

I think the usecase for Toml is configuration, not data transfer, so it would replace .properties files (is YAML ever used for data transfer?). If this is the case, then large files aren't much of a concern.

Besides if the maven dependencies are marked as provided, a user having already Parboiled in his dependencies would surely want to use the Parboiled parser if he don't really need ANTLR. The same apply for a user already using ANTLR.

In that case, alternative parsers should probably be in a different artifact, eg. jtoml-parboiled or jtoml-antlr. However, you could make it a little bit more user-friendly by using the ServiceLoader to automatically pick up the TomlParser from the classpath, if none is specified.

He who can do more can do less, don't you think?

Qui peut le moins, peut le plus, mais qui peut le moins se crée moins de travail. The customisable parser feels very YAGNI and premature, to me.

Alexandre Grison · Answer 7 · Tue Feb 26 2013 04:19:21 GMT+0800 (China Standard Time)

I agree with you about the multiple artifacts and the fact that there is no need for large files at the moment but to me it does not feel YAGNI yet, I mean the library doesn't feel like bloated with insane features.

I was also thinking about ServiceLoader to get an implementation available on classpath and fallback if needed. It seems to be an elegant way to let the user choose without configuration.

I'm not against making the parboiled or antlr parser the default one and make the other one a supplementary artifact depending on which is best.

Carsten · Answer 8 · Tue Feb 26 2013 04:32:52 GMT+0800 (China Standard Time)

Hm, well. My main concern is creating an ANTLR grammar for TOML. I try to avoid language- or implementation-specific stuff in the grammar, but because of my limited knowledge of ANTLR I might end up with tightly coupled code first.

Maybe MoriTanosuke@bf8b13a helps with providing something like a plugin mechanism to dynamically load implementations of TomlParsers.

However, you can check out my branch antlr and I really appreciate any kind of feedback that will help me get to a better approach with ANTLR. :-)

Alexandre Grison · Answer 9 · Tue Feb 26 2013 05:45:10 GMT+0800 (China Standard Time)

I have checked a little your code but I'm no guru with ANTLR so I will need more time in order to be able to tell you something interesting about it.

I already splitted the interface during some refactoring (see commit acc7bac), and I think I will implement something with the ServiceLoader tomorrow defaulting to the current default parser implementation BuiltinTomlParser.java (which could be renamed to BasicTomlParser or SimpleTomlParser like you dide) until one better gets commited Parboiled/antlr).

Thank you both for your comments 👍

Carsten · Answer 10 · Tue Feb 26 2013 15:43:55 GMT+0800 (China Standard Time)

No problem, I think I will change bits of the grammar whenever my level of understanding improves. I am reading http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference to get started with ANTLR v4.

I'll pull acc7bac first to refactor my interfaces to match your code. Then I can pull again when you got something working with the ServiceLoader to give it a test run with my ANTLR code. :-)

Alexandre Grison · Answer 11 · Tue Feb 26 2013 16:10:33 GMT+0800 (China Standard Time)

I will commit around noon (GMT time) 😄

Don Redhorse · Answer 12 · Wed Feb 27 2013 05:12:42 GMT+0800 (China Standard Time)

nice work... I still hope though that you keep the "Simple" approach too without dependency on ANTLR etc..

atm I like the small footprint of these classes, hope that also the validator goes through...

Alexandre Grison · Answer 13 · Wed Feb 27 2013 05:38:34 GMT+0800 (China Standard Time)

Yes, as discussed above I think that the best approach is to have a single jar with no dependencies whatsoever, and additional ones for those who want a specific parser used.

It should not be that difficult to add a validation feature with the actual code (depending on what level of validation we're talking about).

Carsten · Answer 14 · Wed Feb 27 2013 14:36:19 GMT+0800 (China Standard Time)

I think it's possible to ship jtoml without all the additional jars for ANTLR and have the ServiceLoader discover and auto-load everything needed for a new implementation, right? At least that's how I read 826b651

So if I want to use ANTLR with jtoml I can run it with some additional jars and have it auto-discover the new implementation. How do I tell it to use the new implementation? Do I have to overwrite a property me.grison.jtoml.TomlParser from 826b651#L2L-1 on the command line? Or do I simply replace the already existing jar with the default implementation with another jar?

http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html has a good explanation of ServiceLoader, but not much practical advice...

Moandji Ezana · Answer 15 · Wed Feb 27 2013 14:58:21 GMT+0800 (China Standard Time)

So if I want to use ANTLR with jtoml I can run it with some additional
jars and have it auto-discover the new implementation. How do I tell it to
use the new implementation?

If the ServiceLoader discovers an implementation, it should be used
automatically, unless you provide your own instance when parsing.

Alexandre Grison · Answer 16 · Wed Feb 27 2013 15:48:12 GMT+0800 (China Standard Time)

So if I want to use ANTLR with jtoml I can run it with some additional jars and have it auto-discover the new > implementation. How do I tell it to use the new implementation? Do I have to overwrite a property me.grison.jtoml.TomlParser from 826b651#L2L-1 on the command line? Or do I simply replace the already existing jar with the default implementation with another jar?

Normally you should just create a file named me.grison.jtoml.TomlParser in the META-INF/services/ (with content foo.bar.toml.AntlrTomlParser for example) folder of your JAR. The ServiceLoader will detect that two files are having the same name (the one built-in and the one in your additional JAR) and will effectively provide the two TomlParser instance. The Toml class iterates over them and takes any implementation that is not the built-in one, and fallback to the built-in one if no additional implementation can be found. It also throw an exception if more than two additional implementations are found.

Moandji Ezana · Answer 17 · Wed Feb 27 2013 16:11:36 GMT+0800 (China Standard Time)

The core jar shouldn't provide a service. If it finds none, then use the
built-in, if it finds one service, use it. Since ServiceLoader scans the
classpath, it should probably only be invoked once, if possible.

I'm not sure how to handle multiple services. Forcing the user to specify
is safer, but might be annoying? You could also argue ythat if the user
doesn't specify, they don't care, so just use the first one.

Alexandre Grison · Answer 18 · Wed Feb 27 2013 16:22:36 GMT+0800 (China Standard Time)

The core jar does have the Toml frontend class that uses a TomlParser internally. It does fallback correctly to the default parser SimpleTomlParser if it cannot find additional TomlParser implementation. But it uses any other one if it find it.
All this is done in a static initializer (see the file) in the Toml frontend class, so that it's done only one time, the first time Toml is referenced.

Carsten · Answer 19 · Wed Feb 27 2013 17:34:22 GMT+0800 (China Standard Time)

Ok, I think I understood. 👍 I'll try to pull your changes and give it a try with my ANTLR parser as an additional TomlParser.

Carsten · Answer 20 · Fri Mar 01 2013 14:45:33 GMT+0800 (China Standard Time)

After pulling your modifications and having a look I think it's better if I delete my fork of jtoml and create a new repository to build a pure JAR project with my parser and the dependencies needed to run it. Then my JAR can be added to the classpath when running the frontend class from your jtoml.

Carsten · Answer 21 · Fri Mar 01 2013 15:31:32 GMT+0800 (China Standard Time)

Ok, I pushed my new repostiory to github now: https://github.com/MoriTanosuke/jtoml-antlr I builds, but my junit tests are still failing.

Alexandre Grison · Answer 22 · Fri Mar 01 2013 18:29:20 GMT+0800 (China Standard Time)

Ok, I will reference your project in the README today when I have some time.
I just checked the repository, and what's missing is the file located in META-INF/services/ to declare your TomlParser instance.

Carsten · Answer 23 · Fri Mar 01 2013 18:45:09 GMT+0800 (China Standard Time)

Ah, my parser is not really working at the moment so you should mark it as work in progress.

I added the file for my parser instance.