agrison / jtoml

TOML for Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Started working on ANTLR

MoriTanosuke opened this issue · comments

Hey, I just started to work on an ANTLR version of jtoml, mostly for educational purposes to learn ANTLR. It's my first hands-on with a grammar, so I expect to make a lot of dumb mistakes, but maybe you're interested in my current code: https://github.com/MoriTanosuke/jtoml/tree/antlr

I try to push the branch to github whenever I have a major change in the grammar or the application code.

Hi,

I've only worked one time with ANTLR so my experience is limited. Besides it was not a full grammar but expressions with tokens.

For now the current parser works but has limitations for sure, so in the end it would probably be best if we can come up with a grammar for TOML. It could also serve for other languages.

I will take a look at your code this evening.
Thank you

I've also started working on a parser, but using Parboiled, which is simpler and leaner (in terms of dependencies). You can see it here: https://github.com/mwanji/jtoml/blob/peg_parser/src/main/java/me/grison/jtoml/TomlPegParser.java

@mwanji Wow, I've already seen Parboiled one time in the past but did not remember it. That's cool.

I'm thinking about introducing a Strategy which will let the user use whatever parser he wants (ANTLR, Parboiled or the default one), depending on the dependencies he'd like to have.

Interesting. Would all parsing-related dependencies be in scope provided?

I'm not sure I see a much advantage, compared to the loss of ease of use
and increased maintenance burden.

I imagine it could be an advantage for big files or for performance reasons to change the parser.

It would need benchmarks to see which one's the quickest, but let's imagine the ANTLR parser outperforms both the Parboiled one and the builtin one, the user could change the parser dynamically if he knows he's going to load a big file.

Besides if the maven dependencies are marked as provided, a user having already Parboiled in his dependencies would surely want to use the Parboiled parser if he don't really need ANTLR. The same apply for a user already using ANTLR.

I cannot find another good reason except those two.

He who can do more can do less, don't you think?

I imagine it could be an advantage for big files or for performance reasons to change the parser.

I think the usecase for Toml is configuration, not data transfer, so it would replace .properties files (is YAML ever used for data transfer?). If this is the case, then large files aren't much of a concern.

Besides if the maven dependencies are marked as provided, a user having already Parboiled in his dependencies would surely want to use the Parboiled parser if he don't really need ANTLR. The same apply for a user already using ANTLR.

In that case, alternative parsers should probably be in a different artifact, eg. jtoml-parboiled or jtoml-antlr. However, you could make it a little bit more user-friendly by using the ServiceLoader to automatically pick up the TomlParser from the classpath, if none is specified.

He who can do more can do less, don't you think?

Qui peut le moins, peut le plus, mais qui peut le moins se crée moins de travail. The customisable parser feels very YAGNI and premature, to me.

I agree with you about the multiple artifacts and the fact that there is no need for large files at the moment but to me it does not feel YAGNI yet, I mean the library doesn't feel like bloated with insane features.

I was also thinking about ServiceLoader to get an implementation available on classpath and fallback if needed. It seems to be an elegant way to let the user choose without configuration.

I'm not against making the parboiled or antlr parser the default one and make the other one a supplementary artifact depending on which is best.

Hm, well. My main concern is creating an ANTLR grammar for TOML. I try to avoid language- or implementation-specific stuff in the grammar, but because of my limited knowledge of ANTLR I might end up with tightly coupled code first.

Maybe MoriTanosuke@bf8b13a helps with providing something like a plugin mechanism to dynamically load implementations of TomlParsers.

However, you can check out my branch antlr and I really appreciate any kind of feedback that will help me get to a better approach with ANTLR. :-)

I have checked a little your code but I'm no guru with ANTLR so I will need more time in order to be able to tell you something interesting about it.

I already splitted the interface during some refactoring (see commit acc7bac), and I think I will implement something with the ServiceLoader tomorrow defaulting to the current default parser implementation BuiltinTomlParser.java (which could be renamed to BasicTomlParser or SimpleTomlParser like you dide) until one better gets commited Parboiled/antlr).

Thank you both for your comments 👍

No problem, I think I will change bits of the grammar whenever my level of understanding improves. I am reading http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference to get started with ANTLR v4.

I'll pull acc7bac first to refactor my interfaces to match your code. Then I can pull again when you got something working with the ServiceLoader to give it a test run with my ANTLR code. :-)

I will commit around noon (GMT time) 😄

nice work... I still hope though that you keep the "Simple" approach too without dependency on ANTLR etc..

atm I like the small footprint of these classes, hope that also the validator goes through...

Yes, as discussed above I think that the best approach is to have a single jar with no dependencies whatsoever, and additional ones for those who want a specific parser used.

It should not be that difficult to add a validation feature with the actual code (depending on what level of validation we're talking about).

I think it's possible to ship jtoml without all the additional jars for ANTLR and have the ServiceLoader discover and auto-load everything needed for a new implementation, right? At least that's how I read 826b651

So if I want to use ANTLR with jtoml I can run it with some additional jars and have it auto-discover the new implementation. How do I tell it to use the new implementation? Do I have to overwrite a property me.grison.jtoml.TomlParser from 826b651#L2L-1 on the command line? Or do I simply replace the already existing jar with the default implementation with another jar?

http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html has a good explanation of ServiceLoader, but not much practical advice...

So if I want to use ANTLR with jtoml I can run it with some additional
jars and have it auto-discover the new implementation. How do I tell it to
use the new implementation?

If the ServiceLoader discovers an implementation, it should be used
automatically, unless you provide your own instance when parsing.

So if I want to use ANTLR with jtoml I can run it with some additional jars and have it auto-discover the new > implementation. How do I tell it to use the new implementation? Do I have to overwrite a property me.grison.jtoml.TomlParser from 826b651#L2L-1 on the command line? Or do I simply replace the already existing jar with the default implementation with another jar?

Normally you should just create a file named me.grison.jtoml.TomlParser in the META-INF/services/ (with content foo.bar.toml.AntlrTomlParser for example) folder of your JAR. The ServiceLoader will detect that two files are having the same name (the one built-in and the one in your additional JAR) and will effectively provide the two TomlParser instance. The Toml class iterates over them and takes any implementation that is not the built-in one, and fallback to the built-in one if no additional implementation can be found. It also throw an exception if more than two additional implementations are found.

The core jar shouldn't provide a service. If it finds none, then use the
built-in, if it finds one service, use it. Since ServiceLoader scans the
classpath, it should probably only be invoked once, if possible.

I'm not sure how to handle multiple services. Forcing the user to specify
is safer, but might be annoying? You could also argue ythat if the user
doesn't specify, they don't care, so just use the first one.

The core jar does have the Toml frontend class that uses a TomlParser internally. It does fallback correctly to the default parser SimpleTomlParser if it cannot find additional TomlParser implementation. But it uses any other one if it find it.
All this is done in a static initializer (see the file) in the Toml frontend class, so that it's done only one time, the first time Toml is referenced.

Ok, I think I understood. 👍 I'll try to pull your changes and give it a try with my ANTLR parser as an additional TomlParser.

After pulling your modifications and having a look I think it's better if I delete my fork of jtoml and create a new repository to build a pure JAR project with my parser and the dependencies needed to run it. Then my JAR can be added to the classpath when running the frontend class from your jtoml.

Ok, I pushed my new repostiory to github now: https://github.com/MoriTanosuke/jtoml-antlr I builds, but my junit tests are still failing.

Ok, I will reference your project in the README today when I have some time.
I just checked the repository, and what's missing is the file located in META-INF/services/ to declare your TomlParser instance.

Ah, my parser is not really working at the moment so you should mark it as work in progress.

I added the file for my parser instance.