[RFC] Merge with kak-tree-sitter to provide a better out-of-the-box experience?

Question

[RFC] Merge with kak-tree-sitter to provide a better out-of-the-box experience?

krobelus opened this issue 4 months ago · comments

Johannes Altmanninger commented 4 months ago

One of the tree-sitter plugins, kak-tree-sitter seems to be getting some traction.

I believe it would be beneficial for the Kakoune ecosystem if there was a single standard plugin for code intelligence.

Motivation:

It will be easier for users to get going (since it's one plugin instead of multiple, possibly inconsistent ones).
Where there is feature overlap (e.g. highlighting) we can provide a good default experience without forcing the user to choose.
With respect to overlapping features, there'll be no need to worry about version compatibility between the plugins.
It would bundle developer efforts, enabling us to reuse solutions for the problem of "interacting with Kakoune from a long-running process" ¹ ² ³.
Since at least LSP has an infinitely expanding scope, we want to push regular updates. We should make that as smooth as possible for the user, for example by going through the system package manager. Presumably it will be easier for tree-sitter if we take it under our umbrella.
In terms of implementation, it would probably be more elegant for both LSP and tree-sitter to share the same view of the buffer (avoiding potential inconsistencies).
It will be easier to share things like test infrastructure.
There are some learnings from kak-lsp we haven't shared with kak-tree-sitter yet, if they are in the same repo that will be a natural consequence.

Naming

Given it's globally visible, I think kak-lsp is a poor name and I'd like to rename it. So far kakoune-lsp seemed like the obvious choice.

For a general "batteries-included code intelligence" plugin however what we want something like "Language support for Kakoune".
I haven't found a good name, maybe kakoune-langs, kakoune-language-support or kakoune-code-intelligence.

Possible Course of Action

As I haven't started using kak-tree-sitter yet, I'd take my hands off that part of the code.
But I'm happy to do what is needed to get the initial integration done.

The strawman proposal is to

merge the code into a single Git repo (preserving both Git histories)
provide a single binary and config file (though with plans to deprecate it, see ³).
(TODO)

FAQ

Why not switch to Helix?

Certainly a possibility however I'd miss the easy extensibility. Kakoune allows us to keep simple things simple and integrates well into a Unix system.

What if I don't want tree-sitter? Or LSP?

We will provide ways to turn them off at run-time and maybe also compile-time.

What happens with existing kak-lsp packages?

If we do the rename we can definitely provide updates under that alias to avoid users installing old packages.

Should we also provide key mappings OOTB?

Not sure, probably not by default though we could definitely make it opt-in to use the recommended mappings.

How do we define the scope of this? What else will be added in future?

For much of the same reasons as above, I would be in favor of adding (optional) features that are either

related to language support and can benefit this (like debugging, possibly via dap cc @jdugan6240?)
or that we need to implement anyway (like snippets; ours is based on a pre-rewrite version of @occivink's kakoune-snippets).
Are there any other examples we can think of.

Background

I've discussed this before with @phaazon and separately @topisani (conversation starting here)

for example, there should be a single interface for enabling debug logs, to be written to the *debug* buffer. ↩
things like kak-lsp crashes are too hard to track down, they should at least show up in the *debug* buffer. ↩
kak-lsp.toml is inflexible and clunky; there should be a mechanism to override only some language configs; also it should be possible to do everything in kakrc without the confines of TOML. ↩ ↩²

Matt Schick · Answer 1 · Wed Jan 24 2024 13:21:57 GMT+0800 (China Standard Time)

My only question comes from an end user perspective:

In using kak-lsp, I decide on the language server I want to use for the language I’m working in. So would tree-sitter step in for operations that my chosen language server can’t support, or for situations where I haven’t bothered to pick a language server?

I very much agree on the idea that merging them would provide a cohesive experience that handles overlap. I’ve never taken the time to setup kak-tree-sitter because I’ve always used kak-lsp and assumed I would experience weirdness by running at once (don’t know if that even true, but I shied away from even trying it)

James Dugan · Answer 2 · Thu Jan 25 2024 06:20:32 GMT+0800 (China Standard Time)

To be completely honest, I'm not sure how in favor I am of adding a Debug Adapter Protocol client implementation to kak-lsp (or whatever it ends up being called). The reasons are as follows:

Editing (which LSP and tree sitter both help facilitate) and debugging are very different and have different ui/ux requirements. Handling both in one plugin seems like feature creep to me.
If I'm being honest, I consider kak-dap a failed experiment. While I was able to get quite far with just Kakoune buffers, Kakoune ultimately (by design) doesn't offer the feature set necessary to create a robust debug UI from within Kakoune itself. To that end, I'm actually starting over in a new repository with the idea of using an external TUI interface as the main debug UI, with Kakoune itself mainly serving to fill the "jump to current line" functionality. This would represent a drastic change from how kak-lsp currently works, and I don't think it would be very viable to work this into kak-lsp without a completely new architecture.
While many language servers are fairly easy to install, most debug adapters are actually VSCode plugins, and as a result are a pain to set up outside of VSCode. To that end, I would want to implement a system to help install these for the user to ease this pain (similarly to how vimspector works), something that I don't think kak-lsp wants to do.

For the reasons above, I don't think that including debugging capabilities in kak-lsp would be the best idea. If you want to do it anyway, though, go right ahead 🙂.

Sid Kshatriya · Answer 3 · Thu Jan 25 2024 13:00:50 GMT+0800 (China Standard Time)

I'd like to see a tree sitter feature in core kakoune itself. Regexes are not sufficient for a code editor in 2024. But kakoune moves slowly/conservatively. @mawww -- does tree sitter in kakoune core idea appeal to you ?

Igor Ramazanov · Answer 4 · Fri Jan 26 2024 19:22:48 GMT+0800 (China Standard Time)

I like and support the idea of merging them together.

Example of overlapping features:

show document symbols
jump to a symbol in a current buffer
next / prev symbol
scope highlighting / indent guides
scope folding
syntax highlighting / theming
sticky contexts

There's also another idea to facilitate the community development: extract the code which talks to Kakoune and publish it separately as a Cargo crate + expose C ABI and maybe create bindings for major languages: JS/Python/Java/etc., so people could depend on it and use their preferred languages to interoperate with Kakoune. Similar to what tree-sitter does.

Idea seems good, but difficult to predict if it actually works practically.

Kratacoa · Answer 5 · Sat Jan 27 2024 04:39:50 GMT+0800 (China Standard Time)

I tried to access the link to the conversation with @topisani , however I was unable to.

postsolar · Answer 6 · Sat Jan 27 2024 05:10:04 GMT+0800 (China Standard Time)

@Kratacoa it's available in the discord server, messages in the plugins channel from Jan 22nd onwards

Tobias Pisani · Answer 7 · Sat Jan 27 2024 17:03:17 GMT+0800 (China Standard Time)

I'd like to see a tree sitter feature in core kakoune itself. Regexes are not sufficient for a code editor in 2024. But kakoune moves slowly/conservatively. @mawww -- does tree sitter in kakoune core idea appeal to you ?

This is probably not happening for multiple reasons, but a main one is very simple - kakoune currently requires no external dependencies, and tree sitter is a fairly complex one.

My suggestion is we keep it external like this, but focus on adding hooks and integration features to kakoune to improve the usability and what we can do externally

Johannes Altmanninger · Answer 8 · Fri Feb 09 2024 19:11:08 GMT+0800 (China Standard Time)

Today it's easy to configure kak-tree-sitter and kak-lsp to play well with each other.
Let's keep it this way, leaving both cooking for a while and see how things turn out.
If we face concrete issues that are not easily solved with a shared interface, we can revisit the merge. I'll close this to avoid confusion.
Both projects should stay aware of each other to minimize work duplication.
Some action items:

If we find an overlapping feature that works better in either tree-sitter or LSP, deprecate the worse one.
refactor kak-lsp to extract useful functionality that is not specific to LSP (possible to share in future)
refactor kak-lsp to use kak_command_fifo/kak_response_fifo instead of inflexible single shell processes, to facilitate:
- try to find a solution that doesn't need shell processes in the common case (ref mawww/kakoune#4127)
- add native Kakoune options to replace kak-lsp.toml. This will add convenience and enable exotic configurations.
extend kak-lsp commit access to motivated contributors, to build up shared ownership

Lucas Schwiderski · Answer 9 · Fri Feb 09 2024 21:15:53 GMT+0800 (China Standard Time)

If we find an overlapping feature that works better in either tree-sitter or LSP, deprecate the worse one

A big exception: Basic highlighting is generally better in tree-sitter (faster, and you can write your own queries), but LSP's semantic highlighting can provide things on top that aren't possible with just parsing.
So to get the best of both worlds, you'd want to limit kak-lsp's configuration to just those tokens that require semantic analysis (e.g. tree-sitter highlighting a name as regular variable, while LSP knows that it's defined as a constant).

Johannes Altmanninger · Answer 10 · Fri Feb 09 2024 21:46:22 GMT+0800 (China Standard Time)

A big exception: Basic highlighting is generally better in tree-sitter (faster, and you can write your own queries), but LSP's semantic highlighting can provide things on top that aren't possible with just parsing.
So to get the best of both worlds, you'd want to limit kak-lsp's configuration to just those tokens that require semantic analysis (e.g. tree-sitter highlighting a name as regular variable, while LSP knows that it's defined as a constant).

I'm probably ignorant of some details here but one option is to define highlighter precedence by their definition order.

add-highlighter global/tree-sitter ...
add-highlighter global/lsp-semantic-tokens ...
add-highlighter global/user-tree-sitter-queries ...

Later highlighters will be rendered on top of earlier ones.

If such an ordering helps, we can make it the default. Users should not need to care about the order in which they load their plugins.

Lucas Schwiderski · Answer 11 · Fri Feb 09 2024 22:55:32 GMT+0800 (China Standard Time)

There is no split between "default tree-sitter" and "user-provided tree-sitter", so there is no issue about potentially undoing user queries. And the things I was talking about can arise regardless of whether queries are provided by the tool or by the user.

The problems that can arise when combining both highlighters are:

Waste of resources on the LSP side when requesting tokens that have already been colored by tree-sitter
A lot of finicky work to make LSP and tree-sitter use the same color for the same token to avoid one messing up the colorscheme of the other

Hence why optimal compatibility would be reached by having LSP only provide color for things that aren't covered by tree-sitter.

Johannes Altmanninger · Answer 12 · Sat Feb 10 2024 02:19:23 GMT+0800 (China Standard Time)

Waste of resources on the LSP side when requesting tokens that have already been colored by tree-sitter

Ok if that's an issue then treesitter should tell us what's left to color.
But that's something that can be figured out later.

A lot of finicky work to make LSP and tree-sitter use the same color for the same token to avoid one messing up the colorscheme of the other

As clarification for everyone reading along: LSP semantic tokens don't map to raw colors but use standard Kakoune faces.
This means it will work reasonably well with any colorscheme.
If tree sitter does the same, it should fit right in.

So one goal here is a good default mapping of "syntax/semantic token" to "Kakoune face" that both LSP and tree sitter agree on.

Dimitri Sabadie · Answer 13 · Sat Feb 10 2024 06:32:58 GMT+0800 (China Standard Time)

I’m just joining in this conversation as I’ve been busy with life lately. I’m not sure what to think about all of that. I think that merging everything together (LSP / tree-sitter) is tempting, but is it really sound? I do believe that what makes Kakoune such a good tool is how simple it is, and I think the ecosystem should follow that path. Otherwise, you open the gates for various unrelated requests, such as “Eh, can I have copilot too?” or “Eh, why not adding this tronfibulate fuzzy finder that is so useful and…” and you get the picture.

I think that what is lacking currently is a way to share a buffer with external programs. I’ve always thought that, being based on UNIX and POSIX roots, Kakoune should provide some kind of shared memory to buffers via read-only / read-write access. One of the most annoying stuff in KTS is that I have to parse buffers all the time by streaming them via FIFOs (created by KTS and exposed to Kakoune). I’m in the process of using delta and partial updates, but that’s super complex. And I know that this kind of stuff happens in kak-lsp too. And it would happen in anything else requiring access to a buffer.

I’m not sure @mawww has a solution for this. I personally do not for now.

I very much agree on the idea that merging them would provide a cohesive experience that handles overlap. I’ve never taken the time to setup kak-tree-sitter because I’ve always used kak-lsp and assumed I would experience weirdness by running at once (don’t know if that even true, but I shied away from even trying it)

I run both, and I have no issue (but I have disable semantics tokens, so that would probably be something to work on). Also, colorschemes in KTS are… still a thing that is not completely solved (since KTS adds a ton of new faces, it cannot really map to existing, default faces, since the “““standard”””™ list is too slim to provide a good tree-sitter experience).

If I'm being honest, I consider kak-dap a failed experiment. While I was able to get quite far with just Kakoune buffers, Kakoune ultimately (by design) doesn't offer the feature set necessary to create a robust debug UI from within Kakoune itself.

@jdugan6240 I have thought about this, because I need some UI/UX enhancements for stuff I do in personal scripts. For instance, allowing to add annotations / phantom text / whatever you want to kill it anywhere on the display grid, not only on the buffer. I think that your problem really should bootstraps some reflexions upstream, because we lack a couple of useful UI features that are completely orthogonal to specific tools, and then would benefit everyone.

refactor kak-lsp to use kak_command_fifo/kak_response_fifo instead of inflexible single shell processes, to facilitate:
try to find a solution that doesn't need shell processes in the common case

In KTS, there is no shell anymore besides when starting a new session. What I do is actually pretty damn simple:

Start a new session and put a shell expansion calling in the init args of KTS (nop %sh{ kak-tree-sitter -dks --session $kak_session }).
When the server is ready, it will send some the setup commands to your Kakoune session by setting a global option to let the session know where to send commands to communicate with KTS. It’s basically the paths to two FIFOs.
Whenever I want to ask something to KTS, I just simply use the echo -to-file command for regular commands, and write for streaming buffer content.

Something I haven’t figured out (I think it’s just missing from Kakoune) is a way to hook in when a write fails (or at least have a timeout), so that it doesn’t block / freeze Kakoune. Crashing KTS while Kakoune tries to write to its FIFO basically freezes it up (it happens from time to time, as there’s a weird IO bug with the buffer FIFO that I need to fix). When you close the server with ctrl-c for instance, it will send « end of sessions » requests to all connected sessions, allowing to set the FIFO paths to /dev/null (so that we can stream requests / buffer content without having Kakoune block). But for crashes, I have no clue how to fix it.

Lucas Schwiderski · Answer 14 · Sat Feb 10 2024 19:26:57 GMT+0800 (China Standard Time)

As clarification for everyone reading along: LSP semantic tokens don't map to raw colors but use standard Kakoune faces.
This means it will work reasonably well with any colorscheme.
If tree sitter does the same, it should fit right in.

A face is just an alias that maps to a color at the time of usage, though (you could do s/color/face/g in my previous comments).
The finicky work is making it so that LSP and tree-sitter produce the same tokens for the same buffer locations, especially because we have little control over what an LSP server produces.

So one goal here is a good default mapping of "syntax/semantic token" to "Kakoune face" that both LSP and tree sitter agree on.

Hence why this would largely fall onto tree-sitter, and basically require them to model each language's highlighter queries to match the corresponding language server as closely as possible.

IMO there is no good single set of default tokens in kak-lsp. We'll need one that covers basic and semantic highlighting for "standalone" users, and one that only does semantic stuff for people using tree-sitter (or the regular regex highlighters).
Then tree-sitter would be able to provide the best queries they can, without being tied to how a particular language server works.

[RFC] Merge with kak-tree-sitter to provide a better out-of-the-box experience?

Motivation:

Naming

Possible Course of Action

FAQ

Background

Footnotes