ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Matcher Iterative Syntax Highlight Example

darth-cheney opened this issue · comments

Hey all,
Can you provide an example of using a Matcher for syntax highlighting? The docs are a little unclear about how one should actually use this interface. Thanks!

Can you give a bit more detail about what you're trying to do? Ohm can be used to implement syntax highlighting, but it doesn't include any special support for it. Any example would be much more specific to the tool that you're trying to do syntax highlighting in (e.g. VSCode) rather than Ohm itself.

Hi @pdubroy,

I might have made a mistake and assumed that the language documenting Matcher was referring to syntax-highlighting-like behavior in IDEs.

Here's a quick summary of what we are trying to accomplish. We are using Ohm for the scripting language of our authoring environment, which we are provisionally calling Simpletalk.

Simpletalk scripts are made up of "message handlers." The basic components can be found in this file, but you can get the basics of the issue we are having with iterative syntax highlighing using just these few rules:

    Script
      = ScriptPart+ end

    ScriptPart
      = MessageHandler
      | comment
      | lineTerminator

    MessageHandlerOpen
      = "on" messageName ParameterList?

    MessageHandlerClose
      = "end" messageName

    MessageHandler
      = MessageHandlerOpen lineTerminator StatementList? MessageHandlerClose

We have an in-browser editor in which we would like to highlight syntax as the user types, eg after each character input. Ideally we would just use a new semantic operation type to do this, but it's there that we have run into problems. It seems the only way to get a parsing of all the rules for a script -- without matching against specific sub-rules explicitly each time -- is to have a "correct" structure in place when a user enters a character.

For example, I'm not sure how one could highlight portions of the syntax for a MessageHandler while the user is still typing it. Since Ohm will parse from the top without a specific rule, it will begin at ScriptPart. But of course, there is no valid MessageHandler yet -- the user is still typing it out -- though there may be valid MessageHandlerOpen and StatementLines in the script. The match will fail. Concretely, imagine I've typed something like:

on myCustomHandler argument1, argument2
    tell first butto[...]

The [...] is where I stopped typing. None of this is a valid Script, because there is no valid MessageHandler (it requires a MessageHandlerClose). However, there is a valid MessageHandlerOpen and some valid sub-rules within the (incompleted, and therefore also invalid) StatementLine. Ideally the words in the opening line would be highlighted, as would tell, which is a special keyword.

The only other way I can think to do this is to try to match on specific rules every time, but that "feels" like it's not the correct thing to do, right? It seems like a lot of overhead to keep a list of a dozen or more rules and create match objects for them every time the user inputs a single character.

I see. Take a look at what @sakekasi and @alexwarth did in Seymour: https://github.com/harc/seymour/blob/master/lang/grammar.js#L133. Their approach was to define a tokens rule for the purposes of syntax highlighting. Maybe something like that would work for you?

BTW, you can use replaceInputRange for this use case, which preserves the memo table between parses. Only memo table entries overlapping the changed input are discarded.