tree-sitter / node-tree-sitter

Node.js bindings for tree-sitter

Home Page:https://www.npmjs.com/package/tree-sitter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

terminated by signal SIGSEGV (Address boundary error)

talbergs opened this issue · comments

I am no expert in nodejs - this happens to me when running larger queries. Is this a bug or this means I should do some sort special nodejs configuration?

Few last lines of node --trace ./app.js run

   7:       ~get+0(this=0x2ddfdaa47eb9 <Object map = 0xcf571460d61>) {
   7:       } -> 0x0f3686113a21 <Object map = 0xcf57145bd81>
   7:       ~query+0(this=0x0f0cf6de6239 <MainContext map = 0xcf571467019>, 0x1f06e37f4519 <String[#3]: php>, 0x0136007fb1b1 <String[#342]\: \n      (\n        expression_statement (\n          assignment_expression\n          left: (variable_name (name) @var-name)\n        )\n      )\n      (\n        function_call_expression\n        function: (\n          qualified_name (name) @fn-name\n        )\n        arguments: (\n          arguments (variable_name (name) @paa)\n        )\n      )\n    >) {
   8:        ~getSyntax+0(this=0x0f0cf6de6239 <MainContext map = 0xcf571467019>, 0x1f06e37f4519 <String[#3]: php>) {
   9:         ~getSource+0(this=0x0f0cf6de5e09 <FileContext map = 0xcf571466e69>) {
  10:          ~getBuffer+0(this=0x0f0cf6de5e09 <FileContext map = 0xcf571466e69>) {
  10:          } -> 0x0f0cf6de5cb9 <Uint8Array map = 0x39d010521199>
  10:          ~toString+3(this=0x0f0cf6de5cb9 <Uint8Array map = 0x39d010521199>, 0x0c0aaa780471 <undefined>, 0x0c0aaa780471 <undefined>, 0x0c0aaa780471 <undefined>) {
  10:          } -> 0x0f0cf6de65e1 <String[169]\: <?php\n$fields = [\n"000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000"\n];\ncheck_fields($fields);>
   9:         } -> 0x0f0cf6de65e1 <String[169]\: <?php\n$fields = [\n"000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000"\n];\ncheck_fields($fields);>
   9:         new ~Parser+0(this=0x0f0cf6de6859 <Parser map = 0xcf571467061>, 0x0f0cf6de65e1 <String[169]\: <?php\n$fields = [\n"000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000"\n];\ncheck_fields($fields);>, 0x0f0cf6dca499 <Object map = 0xcf571465ea9>) {
  10:          ~loadChain+0(this=0x0f0cf6de6859 <Parser map = 0xcf5714671c9>, 0x0f0cf6dca499 <Object map = 0xcf571465ea9>, 0x0c0aaa780471 <undefined>) {
  11:           ~parse+0(this=0x0f0cf6de6859 <Parser map = 0xcf5714671c9>, 0x1f06e37f4519 <String[#3]: php>, 0x0c0aaa780471 <undefined>) {
  12:            ~Parser.setLanguage+0(this=0x0f0cf6de68e9 <Parser map = 0xcf571457461>, 0x0f36861160c9 <Language map = 0xcf571459ae9>) {
  13:             ~initializeLanguageNodeClasses+20(this=0x3ac530682409 <JSGlobal Object>, 0x0f36861160c9 <Language map = 0xcf571459ae9>) {
  13:             } -> 0x0c0aaa780471 <undefined>
  12:            } -> 0x0f0cf6de68e9 <Parser map = 0xcf571467211>
  12:            ~Parser.parse+0(this=0x0f0cf6de68e9 <Parser map = 0xcf571467211>, 0x0f0cf6de65e1 <String[169]\: <?php\n$fields = [\n"000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000"\n];\ncheck_fields($fields);>, 0x0c0aaa780471 <undefined>, 0x0c0aaa780471 <undefined>) {
  13:             ~input+0(this=0x3ac530682409 <JSGlobal Object>, 0, 0x0f0cf6df6e39 <Object map = 0xcf571467331>) {
  13:             } -> 0x0f0cf6de65e1 <String[169]\: <?php\n$fields = [\n"000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000"\n];\ncheck_fields($fields);>
  13:             ~input+0(this=0x3ac530682409 <JSGlobal Object>, 169, 0x0f0cf6df6ee9 <Object map = 0xcf571467331>) {
  13:             } -> 0x0c0aaa7806d1 <String[#0]: >
  13:             ~Parser.getLanguage+0(this=0x0f0cf6de68e9 <Parser map = 0xcf571467211>, 0x0c0aaa780471 <undefined>) {
  13:             } -> 0x0f36861160c9 <Language map = 0xcf5714672a1>
  12:            } -> 0x0f0cf6df6f41 <Tree map = 0xcf571467451>
  11:           } -> 0x0f0cf6df6f41 <Tree map = 0xcf571467451>
  10:          } -> 0x0c0aaa780471 <undefined>
   9:         } -> 0x0c0aaa780471 <undefined>
   9:         ~get+0(this=0x0f0cf6de6859 <Parser map = 0xcf5714671c9>, 0x1f06e37f4519 <String[#3]: php>) {
  10:          ~get+0(this=0x0f0cf6df6f41 <Tree map = 0xcf571467451>) {
  11:           ~unmarshalNode+0(this=0x3ac530682409 <JSGlobal Object>, 154, 0x0f0cf6df6f41 <Tree map = 0xcf571467451>, 0x0c0aaa780471 <undefined>, 0x0c0aaa780471 <undefined>) {
  12:            ~getID+0(this=0x3ac530682409 <JSGlobal Object>, 0x21018cdb07d1 <Uint32Array map = 0x39d010500f31>, 0) {
  12:            } -> 0x0f0cf6df72c1 <BigInt 93906111597824>
  12:            new ~SyntaxNode+0(this=0x0f0cf6df72e1 <SyntaxNode map = 0xcf5714674e1>, 0x0f0cf6df6f41 <Tree map = 0xcf571467451>) {
  12:            } -> 0x0c0aaa780471 <undefined>
  11:           } -> 0x0f0cf6df72e1 <SyntaxNode map = 0xcf571467571>
  10:          } -> 0x0f0cf6df72e1 <SyntaxNode map = 0xcf571467571>
   9:         } -> 0x0f0cf6df72e1 <SyntaxNode map = 0xcf571467571>
   8:        } -> 0x0f0cf6df72e1 <SyntaxNode map = 0xcf571467571>
   8:        ~Query._init+0(this=0x0f0cf6df7bf1 <Query map = 0xcf571457971>) {

Thanks for the report. I think it’s a bug. What language are you parsing? Is the grammar open source? It’d be great to get a reproducible script that causes this.

I parse php using this grammar

tree-sitter-php@^0.16.2:
  version "0.16.2"
  resolved "https://registry.yarnpkg.com/tree-sitter-php/-/tree-sitter-php-0.16.2.tgz#15c48dbd44cc56c4660d48ef883c9fc0f3e0d35b"
  integrity sha512-BkewhybED1xRQkDpmXkjpBZ1OdnWcmjyu3tGRsoiFqbeNeDtWxac2wzpBdkQr+aSUKlJoCkpBFytM1KcQa5SoA==
  dependencies:
    nan "^2.14.0"

[EDIT] For the record, tree-sitter-node version in use:

tree-sitter@^0.17.1:
  version "0.17.1"
  resolved "https://registry.yarnpkg.com/tree-sitter/-/tree-sitter-0.17.1.tgz#821c5a4ac1afdb623d63f5ffc7916663e732a95c"
  integrity sha512-obIe804bwfAGFMhTjQz0NXF75GDupCVXo7Sv0NVVdA3s/Q4ZI4mdirIN8cpw6bVhz/K1qgUdEuI3SEoOE/q75A==
  dependencies:
    nan "^2.14.0"
    prebuild-install "^5.0.0"
> node --version
v14.8.0

This snippet currently reproduces the error. Delete last element from "keywords" array and error goes away.

const Parser = require('tree-sitter')
const PHP = require('tree-sitter-php')

const parser = new Parser()
parser.setLanguage(PHP)

const tree = parser.parse('<?php //')

const keywords = [
  'empty_statement',
  'named_label_statement',
  'expression_statement',
  'if_statement',
  'switch_statement',
  'while_statement',
  'do_statement',
  'for_statement',
  'foreach_statement',
  'goto_statement',
  'continue_statement',
  'break_statement',
];

const query = keywords.reduce((prev, curr) => {
  return prev + `(${curr}) @statement`
}, '');

(new Parser.Query(PHP, query)).matches(tree.rootNode)

I can't reproduce the problem using this script. What platform are you on?

> uname -a
Linux hoste 5.8.3-arch1-1 #1 SMP PREEMPT Fri, 21 Aug 2020 16:54:16 +0000 x86_64 GNU/Linux

Maybe try a bit larger query on your machine, like:

const keywords = [
	'empty_statement',
	'compound_statement',
	'named_label_statement',
	'expression_statement',
	'if_statement',
	'switch_statement',
	'while_statement',
	'do_statement',
	'for_statement',
	'foreach_statement',
	'goto_statement',
	'continue_statement',
	'break_statement',
	'return_statement',
	'throw_statement',
	'try_statement',
	'declare_statement',
	'echo_statement',
	'unset_statement',
	'const_declaration',
	'function_definition',
	'class_declaration',
	'interface_declaration',
	'trait_declaration',
	'namespace_definition',
	'namespace_use_declaration',
	'global_declaration',
	'function_static_declaration',
];

Hmm, I still can't reproduce it. I also tried repeating the entire keywords list until it was ~500 lines long, and substituting some larger PHP source code for the text. Still runs ok on macOS.

If you get a chance, could you rebuild the tree-sitter module in debug mode, and run this script with a debugger?

To rebuild the module:

npm install -g node-gyp
cd node_modules/tree-sitter
node-gyp rebuild --debug

Then, to run:

lldb node -- test.js

Since the trace ends at Query._init, I would recommend setting a breakpoint in Query::GetPredicates, which is the only native function called by _init:

(lldb) breakpoint set -n Query::GetPredicates
(lldb) run

Thank you for helping me out!
Here is the debugging session for the snippet from above:

05:11:00, /tmp/preview
> lldb node -- index.js
(lldb) target create "node"
Current executable set to 'node' (x86_64).
(lldb) settings set -- target.run-args  "index.js"
(lldb) breakpoint set -n Query::GetPredicates
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) run
Process 107097 launched: '/usr/bin/node' (x86_64)
1 location added to breakpoint 1
Process 107097 stopped
* thread #1, name = 'node', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff46678a4 tree_sitter_runtime_binding.node`node_tree_sitter::Query::GetPredicates(info=0x00007fffffffc3d0) at query.cc:145:48
   142 	}
   143 	
   144 	void Query::GetPredicates(const Nan::FunctionCallbackInfo<Value> &info) {
-> 145 	  Query *query = Query::UnwrapQuery(info.This());
   146 	  auto ts_query = query->query_;
   147 	
   148 	  auto pattern_len = ts_query_pattern_count(ts_query);
(lldb) n
Process 107097 stopped
* thread #1, name = 'node', stop reason = step over
    frame #0: 0x00007ffff46678d5 tree_sitter_runtime_binding.node`node_tree_sitter::Query::GetPredicates(info=0x00007fffffffc3d0) at query.cc:146:8
   143 	
   144 	void Query::GetPredicates(const Nan::FunctionCallbackInfo<Value> &info) {
   145 	  Query *query = Query::UnwrapQuery(info.This());
-> 146 	  auto ts_query = query->query_;
   147 	
   148 	  auto pattern_len = ts_query_pattern_count(ts_query);
   149 	
(lldb) n
Process 107097 stopped
* thread #1, name = 'node', stop reason = step over
    frame #0: 0x00007ffff46678e1 tree_sitter_runtime_binding.node`node_tree_sitter::Query::GetPredicates(info=0x00007fffffffc3d0) at query.cc:148:44
   145 	  Query *query = Query::UnwrapQuery(info.This());
   146 	  auto ts_query = query->query_;
   147 	
-> 148 	  auto pattern_len = ts_query_pattern_count(ts_query);
   149 	
   150 	  Local<Array> js_predicates = Nan::New<Array>();
   151 	
(lldb) s
Process 107097 stopped
* thread #1, name = 'node', stop reason = step in
    frame #0: 0x00007ffff46830cc tree_sitter_runtime_binding.node`ts_query_pattern_count(self=0x0000000000000000) at query.c:2077:24
   2074	}
   2075	
   2076	uint32_t ts_query_pattern_count(const TSQuery *self) {
-> 2077	  return self->patterns.size;
   2078	}
   2079	
   2080	uint32_t ts_query_capture_count(const TSQuery *self) {
(lldb) s
Process 107097 stopped
* thread #1, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x78)
    frame #0: 0x00007ffff46830d0 tree_sitter_runtime_binding.node`ts_query_pattern_count(self=0x0000000000000000) at query.c:2077:24
   2074	}
   2075	
   2076	uint32_t ts_query_pattern_count(const TSQuery *self) {
-> 2077	  return self->patterns.size;
   2078	}
   2079	
   2080	uint32_t ts_query_capture_count(const TSQuery *self) {
(lldb) s
Process 107097 stopped
* thread #1, name = 'node', stop reason = unknown crash reason
    frame #0: 0x00007ffff46830d0 tree_sitter_runtime_binding.node`ts_query_pattern_count(self=0x0000000000000000) at query.c:2077:24
   2074	}
   2075	
   2076	uint32_t ts_query_pattern_count(const TSQuery *self) {
-> 2077	  return self->patterns.size;
   2078	}
   2079	
   2080	uint32_t ts_query_capture_count(const TSQuery *self) {
(lldb) s
Process 107097 exited with status = 11 (0x0000000b) 
(lldb) s
error: invalid thread
(lldb) 

Also, during issue (once you said you cannot reproduce), I updated nodejs 14 -> 15 and clang 10 -> 11
Since then, error now alternates between the SIGSEGV (more often) and Query error (less often):

05:15:57, last:139, /tmp/preview
> node index.js
fish: “node index.js” terminated by signal SIGSEGV (Address boundary error)
05:15:58, last:139, /tmp/preview
> node index.js
/tmp/preview/index.js:28
(new Parser.Query(PHP, query)).matches(tree.rootNode)
 ^

Error: Query error of type TSQueryErrorNodeType at position 317
    at Object.<anonymous> (/tmp/preview/index.js:28:2)
    at Module._compile (node:internal/modules/cjs/loader:1108:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1137:10)
    at Module.load (node:internal/modules/cjs/loader:973:32)
    at Function.Module._load (node:internal/modules/cjs/loader:813:14)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
    at node:internal/main/run_main_module:17:47

So we mark this as bug ?

I have been able to reliably reproduce a seg fault by passing in a single string to a query instead of an S expression, as in new Query(Ruby, "oops");

I added this to the top of my test:

var SegfaultHandler = require('segfault-handler');
SegfaultHandler.registerHandler("crash.log");
➜  beacon-scripts git:(DASHI-677-linter-part-3) ✗ yarn test nDriver
yarn run v1.19.1
$ jest nDriver

 RUNS  packages/i18n/src/drivers/translationDriverTests.test.js
PID 30110 received SIGSEGV for address: 0x78
0   segfault-handler.node               0x00000001046bbfb0 _ZL16segfault_handleriP9__siginfoPv + 304
1   libsystem_platform.dylib            0x00007fff7043a5fd _sigtramp + 29
2   ???                                 0x0000000000000000 0x0 + 0
3   tree_sitter_runtime_binding.node    0x0000000104e627bd _ZN16node_tree_sitter5Query13GetPredicatesERKN3Nan20FunctionCallbackInfoIN2v85ValueEEE + 61
4   tree_sitter_runtime_binding.node    0x0000000104e5974d _ZN3Nan3impL23FunctionCallbackWrapperERKN2v820FunctionCallbackInfoINS1_5ValueEEE + 189
5   node                                0x0000000100257af8 _ZN2v88internal25FunctionCallbackArguments4CallENS0_15CallHandlerInfoE + 616
6   node                                0x000000010025708c _ZN2v88internal12_GLOBAL__N_119HandleApiCallHelperILb0EEENS0_11MaybeHandleINS0_6ObjectEEEPNS0_7IsolateENS0_6HandleINS0_10HeapObjectEEESA_NS8_INS0_20FunctionTemplateInfoEEENS8_IS4_EENS0_16BuiltinArgumentsE + 524
7   node                                0x00000001002567f2 _ZN2v88internalL26Builtin_Impl_HandleApiCallENS0_16BuiltinArgumentsEPNS0_7IsolateE + 258
8   node                                0x0000000100a6fbb9 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit + 57

Also note that I arrived here because of jestjs/jest#8769 (comment)

I have a more pernicious seg fault that is ONLY triggered in a Docker environment, which has more limited memory. It's possible that this bug is separate from the one above, and is triggered by an out-of-memory error in Query.

Reproduce case:

  1. using the Ruby language
  2. try initialize this query:
    (assignment
      left: (_) @var
      right: (method_call
        method: (
          call receiver: (constant) @class
          (#eq? @class "I18n")
          method: (identifier) @method
          (#eq? @method "namespace")
        )
        arguments: (
          argument_list (
            (string) @namespace
          )
        )
      )
    )

as in

new Query(Ruby, `    (assignment
      left: (_) @var
      right: (method_call
        method: (
          call receiver: (constant) @class
          (#eq? @class "I18n")
          method: (identifier) @method
          (#eq? @method "namespace")
        )
        arguments: (
          argument_list (
            (string) @namespace
          )
        )
      )
    )
`)

but ONLY in a Docker context, and only inside a jest test. Note that the seg fault occurs in lib_pthread, so this is a threading issue. I am almost certain the NAPI PR will fix this.

const Parser = require("tree-sitter");
const Ruby = require("tree-sitter-ruby");
const { Query } = Parser;

  describe("createQuery", () => {
    it("?", () => {
      new Query(Ruby, `
      (assignment
        left: (_) @var
        right: (method_call
          method: (
            call receiver: (constant) @class
            (#eq? @class "I18n")
            method: (identifier) @method
            (#eq? @method "namespace")
          )
          arguments: (
            argument_list (
              (string) @namespace
            )
          )
        )
      )
      `
      );
  });
});

further context: the lines

            call receiver: (constant) @class
            (#eq? @class "I18n")
            method: (identifier) @method
            (#eq? @method "namespace")

are responsible. If I remove either

            method: (identifier) @method
            (#eq? @method "namespace")

or

            call receiver: (constant) @class
            (#eq? @class "I18n")

OR remove both predicates (the (#eq?

then the seg fault disappears.

Thank you @cellog!

I am almost certain the NAPI PR will fix this

What is NAPI PR?

I wanted to confirm that actually #81 solves this issue for me, and confirm closing this thread, but building the master branch I get this error:

make: Entering directory '/home/ada/any-style-new/any-style/node_modules/tree-sitter/build'
make: *** No rule to make target 'Release/obj.target/tree_sitter/vendor/tree-sitter/lib/src/lib.o', needed by 'Release/obj.target/tree_sitter.a'.  Stop.
make: Leaving directory '/home/ada/any-style-new/any-style/node_modules/tree-sitter/build'
gyp ERR! build error 
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack     at ChildProcess.onExit (/home/ada/any-style-new/any-style/node_modules/tree-sitter/node_modules/node-gyp/lib/build.js:194:23)
gyp ERR! stack     at ChildProcess.emit (node:events:378:20)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)
gyp ERR! System Linux 5.10.13-arch1-1
gyp ERR! command "/usr/bin/node" "/home/ada/any-style-new/any-style/node_modules/tree-sitter/node_modules/.bin/node-gyp" "rebuild"
gyp ERR! cwd /home/ada/any-style-new/any-style/node_modules/tree-sitter
gyp ERR! node -v v15.11.0
gyp ERR! node-gyp -v v6.1.0
gyp ERR! not ok 

This time again I am not sure if that is my fault or not.

When would the #81 be released?

@talbergs

FWIW and in case anyone else runs into this, make sure to clone with submodules:

git clone --recurse-submodules