tree-sitter / node-tree-sitter

Node.js bindings for tree-sitter

Home Page:https://www.npmjs.com/package/tree-sitter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Query to match all declarations (def) and identifiers (uses) in the code

Symbolk opened this issue · comments

I am using node-tree-sitter to query all def-uses in the code, in the following way:

import * as TreeSitter from 'tree-sitter'
import { Identifier } from './Identifier'
const TypeScript = require('tree-sitter-typescript').typescript
const treeSitter = new TreeSitter()
treeSitter.setLanguage(TypeScript)

 private static analyzeCode(codeLines: string[]): Identifier[] {
   const sourceCode = codeLines.join('\n')
   const tree = treeSitter.parse(sourceCode)
   const query = new TreeSitter.Query(TypeScript, `(identifier) @element`)

   let identifiers: Identifier[] = []
   const matches: TreeSitter.QueryMatch[] = query.matches(tree.rootNode)
   for (let match of matches) {
     const captures: TreeSitter.QueryCapture[] = match.captures
     for (let capture of captures) {
           identifiers.push(new Identifier(capture.name,tree.getText(capture.node))
     }
   }
   return identifiers
 }

However, it returns keywords that I do not want. For example,

For code:

    private orgLines: string[] = [];

It returns:

[‘public’, ’string', '']

After reading the query syntax (http://tree-sitter.github.io/tree-sitter/using-parsers#query-syntax) and the test code (https://github.com/tree-sitter/node-tree-sitter/blob/master/test/query_test.js), I am wondering:

  1. Is it possible to query for all the subtypes of declarations with the wildcard: (_declaration: name (identifier))?
  2. Is it correct to filter the language-specific keywords from matches: (_) name: (identifier)?

Could you provide a complete snippet of code and a query that work in Tree-sitter playground?

Could you provide a complete snippet of code and a query that work in Tree-sitter playground?

Here they are:

Code in TypeScript:

export class Conflict {
    public hasOriginal: boolean = false;

    private textAfterMarkerOurs: string | undefined = undefined;
    private textAfterMarkerOriginal: string | undefined = undefined;
    private textAfterMarkerTheirs: string | undefined = undefined;
    private textAfterMarkerEnd: string | undefined = undefined;

    public static parse(text: string): ISection[] {
        const sections: ISection[] = getSections();
        const lines: string[] = Parser.getLines(text);

        let state: ParserState = ParserState.OutsideConflict;
        let currentConflict: Conflict | undefined = undefined;
        let currentTextLines: string[] = [];
  	}

Query to get some def-uses:

(method_definition name: (property_identifier) @fn-def)
(class_declaration name: (type_identifier) @class-def)
(public_field_definition name: (property_identifier) @field-def)
(variable_declarator name: (identifier) @var-def)
  (call_expression
    function: [
      (identifier) @function
      (member_expression
        property: (property_identifier) @method)
    ])

I have noticed that for each language, the query is different, I am wondering for typescript here, is it possible to have a simpler query to match all defs and uses? (Maybe it is not a good idea, I feel that writing such queries are tedious but clear to read!)

I have noticed that for each language, the query is different

I saw somehere in the issues the author's thounghts that in future there may be a work on standartization to make queries portable, for now all languages defined in own terms what requires different queries.

is it possible to have a simpler query to match all defs and uses?

It's better to define queries as a series of small queries organized in a batch than try to organize all in a big one query. Tree-sitter's query engine executes all queries in a butch concurently and also this opens possibility to combine small queries in different ways.