jank-lang / jank

A Clojure dialect hosted on LLVM with native C++ interop

Home Page:https://jank-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add editor syntax highlighting for jank

jeaye opened this issue · comments

jank has support for a new special form, called native/raw. It works in place of Clojure's interop syntax and allows for inline C++. But it also support interpolating jank expressions into that C++. Docs on the rationale and final solution are here: https://github.com/jank-lang/jank/blob/main/DESIGN.md#interop

Right now, all of this gets highlighted as a string, in vim/emacs/vscode, but the interpolated forms are Clojure code and should be highlighted accordingly. Ideally normal code completion, repl behavior, etc can work from within those forms, but we can take this one step at a time.

We may be able to make the syntax highlighting changes here in https://tree-sitter.github.io/tree-sitter/ and call it a day. But we might also consider getting into the vim/emacs/vscode configurations for Clojure and then forking them for jank to add this support. Would be your call on how to tackle this. I use vim and would love it to have this working, but we'll want great tooling for everyone, so might as well start with whatever you use.

  • vim
  • emacs
  • vscode
  • sublime
  • pulsar

A question: what do you think about replicating what ClojureScript already do? In CLJS, this is a way to do interop:

(js* "10 + ~{}" 30)

Maybe a (native/raw "std::cout << ~{} << \"\n\"" 20) would need less editor support and Kondo could work without changes.

Someone brought this up at the Conj as well. I actually wasn't familiar with the CLJS interpolation syntax, but I think consistency makes sense. Kondo would still need to understand that native/raw is a special form, but the code for ~{} could likely be reused.

I'll try to implement this in Pulsar, but first this ticket on Tree-Sitter needs to be done: sogaiu/tree-sitter-clojure#52.

Otherwise, "injected" C++ grammar will highlight the whole string, not the contents.

I'll try to implement this in Pulsar, but first this ticket on Tree-Sitter needs to be done: sogaiu/tree-sitter-clojure#52.

Otherwise, "injected" C++ grammar will highlight the whole string, not the contents.

Excellent! Thanks for the help. I'm looking forward to seeing this. Having great syntax highlighting support for the C++ inside native/raw and the jank inside its interpolation is going to be a huge win. Right now, it looks bad.

raw

Following up on Vim support for this. I've done some research into foreign syntax regions, and managed to get something working, but it's clunky and it flashes when I move the cursor. I suspect it's due to the performance of the jankNativeRawString regex. When it's just +\zs[^"]*\ze"+, which supports only one line, it works reliably. Multi-line matching is much harder.

syn include @CPP syntax/cpp.vim
syn match jankNativeRawString +"\zs\_.\{-}\ze"+ contains=@CPP
syntax region jankNativeRaw matchgroup=Special start=+(native/raw+  end=+)+ contains=jankNativeRawString

This is a good starting point for moving forward, but I wonder if this is complex enough to require semantic highlighting via LSP instead. Either way, if someone wants to pick this up and run with it, I'd love to get C++ highlighting for all of those functions.

Of course, this doesn't support going back to Clojure with interpolation.

Well, it is possible:
image

There are some issues with a string inside a string, because it needs to be escaped... but it's better than having nothing I guess :)

Nice! What editor did you do that in? When I had it around that point in vim, with foreign syntax regions, it flickered whenever I moved around or edited text. Are you seeing that?

I did it on Pulsar editor, it's a fork of Atom. I did with Injections on tree-sitter, basically it "injects" one language into another.

In Clojure, a string usually is represented as (str_lit) but on this patched tree-sitter version, it is represented as (str_lit " (str_content) "), so I can say "if I have a (str_content) that is a child of a str_lit that is a child of a list_lit and that is the first argument is native/raw then inject the C++ language into it".

All the magic happens on this PR: pulsar-edit/pulsar#729, but more specifically, on these lines: https://github.com/pulsar-edit/pulsar/pull/729/files#diff-ed2a3159c63d8f3c78945b50e99346ef08662b9107182d3db84b9be755bd581fR54-R61

As this is all a feature of the editor (injections are used everywhere, all the time) I don't experience any flicker really

Oh, very cool! Great work. I'll add Pulsar to the list at the top of this thread and we can mark it off once everything's merged.

IIUC, Neovim and Emacs do or will have ways to work with tree-sitter-clojure as-is:

We haven't reached a decision about whether to modify tree-sitter-clojure yet.

Below are some of the bits we've been considering regarding this situation.

  • As mentioned in the latter part of this comment, there appear to be other grammars that have a structural similarity to tree-sitter-clojure (WRT strings and their delimiters).

  • As remarked at the end of this comment:

    I haven't seen any recommendations in the official tree-sitter docs regarding structuring nodes in one's grammar to work better with injections, may be it could be suggested as an addition. Though at this point I wonder how much good it would do.

    Given the large number of grammars in existence (> 200), on the surface it seems unlikely to me that they will all be made to parse strings in a particular manner.

  • There have been some discussions about trying to share some or parts of queries among some editors. I don't know what the status of this is, but it seems to me that if that is pursued in some fashion, support for certain predicates (e.g. #offset! -- IIUC this is used in Neovim to help with some injection cases) might increase.

I don't know much about this jank project, at a glance it sounds very neat. Are there any new syntax constructs in jank that aren't present in Clojure apart of the interpolation in the native/raw strings? If there are we could consider supporting them in tree-sitter-clojure or perhaps a derivative of tree-sitter-clojure.

Are there any new syntax constructs in jank that aren't present in Clojure apart of the interpolation in the native/raw strings?

At this moment, no. jank is meant to be strongly Clojure[Script] source compatible and the only differences it should have will be around how it handles interop. For now, that's just via native/raw, but it's possible that the syntax is later extended to more seamlessly support working with C++. I don't foresee this happening in the coming few years, at least.

If there are we could consider supporting them in tree-sitter-clojure or perhaps a derivative of tree-sitter-clojure.

That would be superb and I really appreciate the support. There are three primary things needed for strong native/raw support:

  1. Highlighting the contents as C++ (and ideally getting LSP to work with the C++ here, too, understanding the scope of the expression)
  2. Highlighting the interpolation as jank again (and ideally getting back to jank/Clojure LSP in there)
  3. Handling braces/indentation as C++ would, rather than how Clojure would

It's worth noting that these can nest infinitely, so jank -> native/raw -> interpolation -> native/raw -> interpolation and so on is possible. This is a more pathological case, though, and I don't think we'd want our happy path to be hindered to support this, if it came to that.

Again, thanks for the interest here. A great developer tooling experience can make a language blossom just as much as a poor one can ensure it wilts.