antlr / antlr5

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Design - Compile the runtime Lexer to WebAssembly

ericvergnaud opened this issue · comments

Beyond tooling issues, we also need to deal with paradigms that cannot work with WebAssembly.

In the current runtime, the Lexer is an abstract class, and the generated actual XXXLexer inherits from it.
This paradigm won't work with WebAssembly, especially not across language targets.

Looking at a generated XXXLexer, it doesn't provide behavior, rather it provides data that the runtime Lexer will use.
Therefore an idea that comes to mind is to evolve the design as follows:

  • the generated XXXLexer becomes a standalone class that:
    • provides data to a runtime Lexer instance
    • forwards calls such as nextToken to that Lexer instance
  • the data itself sits in a LexerData record (data class in Kotlin)
  • the runtime Lexer becomes a concrete class that requires a LexerData record when instantiated

My plan is to first make the above work in Kotlin, then compile to Wasm.
Your comments on the proposed design are welcome.

(and more generally, I plan to learn from the lexer migration, and avoid big mistakes when migrating the parser)

To me seems a plan that makes sense, and it is great to start to see experiments with WASM!

This paradigm won't work with WebAssembly, especially not across language targets.

I think this depends on your goals when exposing the WASM module to target languages.
Which parts of the lexer or parser do you want to expose?

As you know, Wasm doesn't contain classes, only globals and functions.
Antlr5 needs to connect 3 objects:

  1. a wasm runtime lexer/parser (our code)
  2. a wasm generated lexer/parser (generated from the grammar)
  3. a host language wrapper (generated by the target add-on)

The idea that a class in 2 can derive from a non-class in 1 sounds impossible to me.
Rather it will call functions from 1 and provide callbacks to 1.
Similarly, 3 can and will have bindings to call into 2, but it can't derive from 2 or 3, because to achieve that it would need the ancestor class to genuinely exist in the host language.

So not sure what you mean by what is being exposed ? What am I missing that would make it possible to use inheritance across 1, 2 and 3 ?

Ahhh, I get what you mean now.
But wait, what you want to do is have:

  • a WASM module (.wasm binary) for the runtime
  • a WASM module for the generated grammar
  • a target language wrapper over the generated grammar module

Is that correct?
In case it is, I just don't see why we would want two separate modules. Wasn't the idea to have an all-in-one bundle?

What am I missing that would make it possible to use inheritance across 1, 2 and 3

I'd leave out 3, and focus on 1 and 2, if my understanding of what you want to do is indeed correct.
For that to be possible I guess you'd need the WASM component model.

However, the way Kotlin will implement the component model isn't decided yet, as far as I know.
Thus, it's also impossible to know how we will be able to expose definitions from the WASM module, and what the limitations will be.

Can't we start with migrating to Gradle and running all tests with Kotlin Wasm? Is it possible?

I would do that before, yeah. We can discuss about it in tomorrow's call in case.

Looking at a generated XXXLexer, it doesn't provide behavior,

Does this mean no actions or predicates?

Looking at a generated XXXLexer, it doesn't provide behavior,

Does this mean no actions or predicates?

No, these would be invoked via callbacks rather than inlined. We might treat actions and predicates written in Kotlin differently since these could be inlined, but we're not there yet... (it's an optimization and we shouldn't optimize first)

Can't we start with migrating to Gradle and running all tests with Kotlin Wasm? Is it possible?

I'm not bought into this approach because due to i/o stuff, it would require using WASI, which we're not looking to support (there is no WASI for the web). I'd rather get as close as possible to our target architecture before enabling wasm compilation. But as suggested we can discuss later today.

Ahhh, I get what you mean now. But wait, what you want to do is have:

  • a WASM module (.wasm binary) for the runtime
  • a WASM module for the generated grammar
  • a target language wrapper over the generated grammar module

Is that correct? In case it is, I just don't see why we would want two separate modules. Wasn't the idea to have an all-in-one bundle?

No, the idea is to have a reusable wasm runtime. I can see 3 benefits:

  • faster build time (we don't recompile the full runtime on each grammar change)
  • shared module for deployments that support multiple grammars
  • sticks to a proven and clear separation of concerns

What am I missing that would make it possible to use inheritance across 1, 2 and 3

I'd leave out 3, and focus on 1 and 2, if my understanding of what you want to do is indeed correct. For that to be possible I guess you'd need the WASM component model.

How do you run 1 and 2 without 3 ?

However, the way Kotlin will implement the component model isn't decided yet, as far as I know. Thus, it's also impossible to know how we will be able to expose definitions from the WASM module, and what the limitations will be.

Yes in the end state we should rely on the component model. Given the rather slow speed at which things get done though, we could rely on wasm-merge for the short term (a tool that merges 2 or more modules).

I'm not bought into this approach because due to i/o stuff, it would require using WASI, which we're not looking to support (there is no WASI for the web).

If I understand correctly, Strumenta antlr kotlin currently supports Kotlin wasm target (I see a wasmJsMain directory there).

I'm not bought into this approach because due to i/o stuff, it would require using WASI, which we're not looking to support (there is no WASI for the web).

If I understand correctly, Strumenta antlr kotlin currently supports Kotlin wasm target (I see a wasmJsMain directory there).

Yes, my understanding is that in Kotlin there are two WASM targets: one generating WASM and a Js wrapper, intended for running in the browser, and a second target producing WASM for WASI. Both should be supported by the Kotlin target for ANTLR 4

@lppedd please correct me if I am wrong

Yes, Strumenta's repository supports both WASM targets (but overall, it supports * all * Kotlin targets).

due to i/o stuff

It really depends on what I/O stuff we are talking about. If I/O is moved out of the test infrastructure (or at the very beginning, before passing the ball to WASM), targeting wasmJs shouldn't be an issue.

Moving I/O out is indeed one of the preliminary activities required.
Another one is to define an API that will be exposed by the runtime and the generated parser.

Looking at a generated XXXLexer, it doesn't provide behavior,

Does this mean no actions or predicates?

No, these would be invoked via callbacks rather than inlined. We might treat actions and predicates written in Kotlin differently since these could be inlined, but we're not there yet... (it's an optimization and we shouldn't optimize first)

For actions and predicates I would suggest looking at the approach used in templating languages like Handlebars.

One of the things that makes Handlebars so portable is that the "helpers" are provided externally, so there is no host language syntax leaking into the template.

I'm aware that the functionality in #51 does not follow this approach at all, but a major release like v5 seems like a good time to change the direction on actions and predicates.