tree-sitter / tree-sitter-julia

Julia grammar for Tree-sitter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed to build wasm

wszgrcy opened this issue · comments

commented

I have built many times,I have succeeded a few times and also failed many times
But every time it takes up a lot of memory, the computer's 64GB of memory is all occupied, and then it crashes

I use npx tree-sitter build-wasm xxx to build branch: master
May I ask how to provide more detailed information to help troubleshoot issues?

Tree-sitter-julia does require memory than other grammars, but this seems excessive. I'll try to troubleshoot and if I find the issue I'll add wasm to CI to avoid regressions.

@savq Do you have any idea how to improve the situation? There is some interest in making a Julia extension for the Zed editor here. It looks like we need a machine with 128 GB RAM just to make the julia.wasm.

I read that Julia's use of unicode characters as operators may require more memory. I tried to remove most of the operators but that didn't help. What else could we try?

I read that Julia's use of unicode characters as operators may require more memory. I tried to remove most of the operators but that didn't help. What else could we try?

The operators aren't really a problem. The biggest problem with julia syntax are sequences of expressions without any delimiter: macros, matrices and juxtapositions (sort of).

I made an experiment in this branch, refactoring some stuff and removing those problematic forms entirely. The resulting parser.c is still around 33M, and the wasm build takes around 10 minutes in my machine, but it could be a starting point for the zed extension.

I did some work on tree-sitter to improve this to some degree - I haven’t tested it in the Julia grammar.

tree-sitter/tree-sitter#3234

The operators aren't really a problem.

@savq, I tossed things around again and now I'm pretty sure that the number of operators is at least one of the problems. I missed the tree-sitter generate step 2 weeks ago, sorry.

Today I removed all but ≈ 50 operators. RAM usage went down from 60 GB to 4 GB with tree-sitter v0.20.8. Combined with your "smallish" branch, parser.c is 20 MB. That looks much better!

Unfortunately, I guess Zed requires a newer tree-sitter version (I see errors in Zed's log). Trying to build the WASM (tree-sitter build -w) with the latest tree-sitter v0.22.2 gives me the following error message:

This external scanner uses a symbol that isn't available to wasm parsers.

Missing symbols:
    exit

Available symbols:
    calloc
    free
    iswalnum
    iswalpha
    iswblank
    iswdigit
    iswlower
    iswspace
    iswupper
    iswxdigit
    malloc
    memchr
    memcmp
    memcpy
    memmove
    memset
    realloc
    strcmp
    strlen
    strncpy
    towlower
    towupper

What does that mean? The only exits I can find:

> grep -R exit
./src/scanner.c:  if (arr == NULL) exit(1);
./src/scanner.c:  if (stack == NULL) exit(1);
./src/scanner.c:  if (stack->len >= TREE_SITTER_SERIALIZATION_BUFFER_SIZE) exit(1);
./src/scanner.c:  if (stack->len == 0) exit(1);

Exactly what it says on the tin: the exit function from libc is not available in wasm. #136 fixes this (among other things) by using the built-in abort instead.

@clason: Thank you, I'd assumed scanner.c was auto-generated. I replaced the four instances of exit(0) with abort() and it built.

This is my setup:

  • clone the current tree-sitter-julia master
  • remove most operators
  • edit scanner.c as above
  • use tree-sitter v0.22.2
  • tree-sitter generate consumes 3 GB RAM and takes 2 mins on my aging MacBook
  • tree-sitter build -w consumes 4 GB RAM and takes 2 mins
  • parser.c is 54 MB and julia.wasm is 5.8 MB
  • tree-sitter test: most tests pass, 6 fail, possibly due to the removed operators
  • wrap julia.wasm in a local Zed extension
  • comment out quite a few lines in highlights.scm and the other .scm files that cause errors in Zed's log

In Zed, the syntax tree for some of my .jl files looks pretty clean considering all the hacks above. Basic syntax highlighting works.

I'd call it progress but it's certainly not safe for public consumption. I'll wait for #136 getting merged and try again.

I'd assumed scanner.c was auto-generated.

It is not; it needs to be hand-written. And, yes, you should absolutely wait for the mentioned PRs to be merged.