tree-sitter / tree-sitter

An incremental parsing system for programming tools

Home Page:https://tree-sitter.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Grammar library build/install

mattmassicotte opened this issue · comments

Something that I do with my own usage of tree-sitter is building standalone libraries for each grammar I use. This always follows the same pattern. Build with npm, use ar to create a static library, and hand-write a .h and .pc file.

I got to thinking that it might be really convenient for some users to have a Makefile that is conceptually similar to the runtime's. This way, if needed, the library could be compiled, packaged and installed in an automated way.

I thought I'd bring it up in an issue first to see if there's any interest. If so, I was could look into building and adding them for the grammars, and modifying tree-sitter generate to produce it ahead of time.

What do you think?

I like this idea. It makes sense to make the generated parsers as easily consumable by C-compatible languages as possible.

So what do you think the file structure should be? We have this bindings folder right now, for the files that are specific to certain binding languages. Maybe we should add a c directory there. The Makefile though would need to live at the root though, similar to Cargo.toml and package.json. How about these additions?

tree-sitter-go
├── Makefile
└── bindings
    └── c
        ├── tree-sitter-go.pc.in
        └── tree-sitter-go.h

This looks perfect to me!

What do you think is the best way to start? Get something working for one grammar first?

Yeah, if you want to just hand-write these files and open a PR on some Tree-sitter language repository that you're using, we could talk about the files in more detail on that PR. And then we could add some logic to tree-sitter generate to auto-generate similar files going forward.

Great, that's exactly what I'll do.

Thanks so much!

An issue came up related to cross-platform Makefile compatibility. I wanted to get other opinions here on how to proceed. Is it better to try to come up with one, standardized file that works for all kinds of parsers? Or, is it ok to specialize the files for C vs C++ parsers?

See: tree-sitter/tree-sitter-go#56 (comment)

wanted to get other opinions here on how to proceed. Is it better to try to come up with one, standardized file that works for all kinds of parsers?

I've made this file with that purpose in mind, and maybe it makes more sense to add it here instead: https://github.com/kylo252/nvim-treesitter/blob/robust-makefile/scripts/compile_parsers.makefile

@maxbrunsfeld, would it be possible to use a generic "fallback" makefile that can work for all parsers? The option can be even marked as experimental.

@kylo252 your implementation is great! But your suggestion is better.

It's become quite a chore to merge and maintain the needed bits across all the parsers. The make infrastructure has become (nearly) generic. Maybe there's some way this could be delivered automatically as part of the core tree-sitter components?

I've tried using a couple of these Makefiles to build parsers into RPM packages for Fedora.

Since I'm working with the contents of tarballs rather than Git repositories, the following snippet in tree-sitter-json's Makefile doesn't work:

PARSER_REPO_URL := $(shell git -C $(SRC_DIR) remote get-url origin )

I can work around this by passing a value for PARSER_REPO_URL when calling make:

make PARSER_REPO_URL=https://github.com/tree-sitter/tree-sitter-json/

However, when I tried the same thing with tree-sitter-html, it didn't work, because the Makefile uses ?= rather than :=:

PARSER_REPO_URL ?= $(shell git -C $(SRC_DIR) remote get-url origin )

Instead, the workaround here is to the following before calling make:

git init
git remote add origin https://github.com/tree-sitter/tree-sitter-html/

This is all doable, but it's not as smooth as it could be. Perhaps these Makefiles should include a fallback, such as the following?

PARSER_REPO_URL := $(shell git -C $(SRC_DIR) remote get-url origin || echo https://github.com/tree-sitter/$$(basename $(CURDIR)) )

The work being done here seems like it could go a long way towards addressing the underlying problem: #2438