tree-sitter / tree-sitter-javascript

Javascript grammar for tree-sitter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug: 'class_static_block' cant contain 'extras' like newlines

jackschu opened this issue · comments

Repro

The following piece of code is valid but it is parsed incorrectly:

class Foo {
    static
    //hi

  {
     1 + 1;
  }
};

Here's a link to the TypeScript Playground showing that the snippet above is valid JavaScript or TypeScript:
https://www.typescriptlang.org/play?#code/MYGwhgzhAEBiD29oG8BQ0PQgFzNglsOpgPQkAW+qxamGAjNANTT0DcxAvqpx0A

The output of tree-sitter parse is the following:

(program [0, 0] - [8, 0]
  (class_declaration [0, 0] - [2, 8]
    name: (identifier [0, 6] - [0, 9])
    body: (class_body [0, 10] - [2, 8]
      member: (field_definition [1, 4] - [1, 10]
        property: (property_identifier [1, 4] - [1, 10]))
      (comment [2, 4] - [2, 8])))
  (statement_block [4, 2] - [6, 3]
    (expression_statement [5, 5] - [5, 11]
      (binary_expression [5, 5] - [5, 10]
        left: (number [5, 5] - [5, 6])
        right: (number [5, 9] - [5, 10]))))
  (ERROR [7, 0] - [7, 1])
  (empty_statement [7, 1] - [7, 2]))
../../open_source/prettier/tests/format/js/class-static-block/with-line-breaks.js	0 ms	(MISSING "}" [2, 8] - [2, 8])

this is on commit ac10a11e0c8db512f70e6b798260d2516d22454c

Analysis

I did a bunch of digging, it seems like this happens because the language believe 'static' is the start of a property name
I think tree sitter here should have decided that the class_static_block and field_definition here represent a conflict and gone to GLR, but somehow the presence of $._semicolon confuses it

    class_body: $ => seq(
      '{',
      repeat(choice(
        seq(field('member', $.method_definition), optional(';')),
        seq(field('member', $.field_definition), $._semicolon),
        field('member', $.class_static_block),
        field('template', $.glimmer_template),
        ';',
      )),
      '}',
    ),

I tried adding to conflicts after removing optional(';') and $._semicolon, the former is clearly redundant, but the latter is load bearing because the spec requires that semicolon or else statements like this would be considered valid

class Bar { bar = 4 baz = 4 }

I think the best way to fix this was just to allow an optional "automatic semicolon" after 'static', creating a conflict and letting GLR resolve it, thanks for the report

hey @amaanq it seems like #314 will cause treesitter to parse the following block without emitting an error

class Foo {
    static ; {}
}

But a JS runtime would give Unexpected token '{' is this OK?
I've vaguely heard that treesitter is allowed to parse more than what the spec allows, but I'm curious how you think about it

you're right, sorry about that! it should be just _automatic_semicolon, should be fixed in a sec