litesolutions/sonar-sslr-grappa

Read me first

This package is licensed under the Apache Software License, version 2.0.

This package requires Java 8.

Please see the file LICENSE for more details.

What this is

This package is a simple framework for writing a language plugin for SonarQube. At its core, it uses parsers written with grappa.

Requires Java 8+.

The current version is 0.6.2:

groupId: org.litesolutions;
artifactId: sonar-sslr-grappa;
version: 0.6.2.

About SSLR

Principle

SSLR means SonarSource Language Recognizer. This is the framework used by SonarQube to match and tokenize languages and build an AST (Abstract Syntax Tree).

An AST in Sonar is a sequence of AstNode instances; those same AstNodes are what your language checks will subscribe to and perform checks upon the Token associated with this node (or its parent(s), sibling(s), child(ren) etc).

The Sonar API provides two mechanisms with which you can produce an AST.

Using a `LexerlessGrammarBuilder`

Using this method, an SSLR grammar takes upon itself to guide both the token/AST node production and parsing of the input text. Using this method, terminals (that is, grammar rules which do not depend on any other rules) are sequences of text in the input.

While very flexible, writing such a grammar is very involved; more often than not, for languages even moderately complex, this means implementing quite a few helper classes to properly guide the parsing process.

Two examples of Sonar language plugins written using this technique are the Java and JavaScript language plugins.

Using a `LexerfulGrammarBuilder`

Using this method, your grammar rules are purely declarative in that their terminals are AstNode instances (more often than not instances of TokenType), not input text. And matching the input text to produce tokens is delegated to another mechanism, and this other mechanism is channels.

While this mechanism looks easier to work with, channels have their own challenges as well. Here is how channel dispatching works:

A dispatcher invokes the channels. Those channels run one by one, and the first one which succeeds wins.
A channel can match any amount of text, or none at all, and it may produce any amount of tokens, including none.
This process goes on repeatedly until no text of the input is left to match.

The consequences are as follows:

you can end up with a lot of channels;
you must account for the order in which your channels are declared;
which means that the code of your channel itself may contain a lot of logic in order to avoid this channel to match at any given point in your parsed input, etc.

One example of a Sonar language plugin using this technique is the Python language plugin.

What this package does instead

The grammar is fully declarative...

That is, you use a LexerfulGrammarBuilder and your terminals are (usually) TokenTypes...

But you only have one channel

... And this channel is a grappa parser.

Should you wonder what a grappa parser looks like, here is a JSON parser.

Advantages

Separation of concerns

You write your grammar so that you are only concerned about how your AST should look like; you need not be concerned about what text is matched by those tokens.

If anything, the elements of your language plugins which could care about the actual content of the tokens are your checks -- not the grammar.

Ease of debugging

Grappa provides the following tools to help you:

a tracer, with which you can record your parsing run;
a debugger, with which you can analyze your parsing run.

The API allows you to register a tracer which will generate trace files (those are zip files) in the directory of your choice.

Versatility

Grappa is a full fledged PEG parser; and it has mechanisms to help you get things done which go beyond PEG as well. For instance, the famous "a^n b^n c^n" grammar can be matched with this single rule:

public Rule anbncn()
{
    final Var<Integer> count = new Var<>();
    return sequence(
        oneOrMore('a'), count.set(match().length()),
        oneOrMore('b'), match().length() == count.get(),
        oneOrMore('c'), match().length() == count.get()
    );
}

How this works

The parser

In order to write a parser, you will need to extend SonarParserBase; this (abstract) parser implementation extends ListeningParser<Token.Builder> (which, by the way, means you can register any listener to your parser implementation provided it has the necessary annotations; see EventBus).

What you push is not a token directly but an instance of Token.Builder; all you have to do is use the method pushToken(someTokenType), with someTokenType being an instance of TokenType.

This method will grab the indices of the text matched by the immediately preceding rule and produce a Token.Builder on the parser's stack (which is unwound when the parsing finishes). For instance:

public Rule lotsOfAs()
{
    return sequence(oneOrMore('a'), pushToken(MyTokens.LOTS_OF_AS));
}

When the channel finishes the parsing, it will then append all the tokens to the Lexer instance associated with the channel.

Note that this is REALLY the text matched by the previous rule! That is, this for instance:

public Rule probablyNotWhatYouMeant()
{
    return sequence("foo", "bar", pushToken(MyTokens.FOOBAR)); // Oops...
}

will associate the token in the lexer with "bar", not "foobar"! You'd need to write this instead:

public Rule thatIsBetter()
{
    return sequence(
        sequence("foo", "bar"),
        pushToken(MyTokens.FOOBAR)
    );
}

The (SSLR) parser factory

This package also offers a factory which allows you to generate SSLR parsers out of two elements:

your (grappa!) parser class,
your grammar class.

From such a factory, you can obtain not only rules for parsing a full source file, but also just extracts from it; this makes it very convenient for testing only part of your parser and/or grammar.

Technical notes...

This package uses a specialized version of the grappa-tracer-backport.

This is needed because Sonar plugins require an old version of Guava (10.0.1) whereas grappa depends on Guava 18.0.

Among other things, this means that if you with to include listeners of yours in the parser (which relies on the @Subscribe annotation for the appropriate methods) you need to include this:

// Note the initial "r"!
import r.com.google.common.eventbus.Subscribe;

instead of:

import com.google.common.eventbus.Subscribe;

because the latter is not guaranteed to succeed!

litesolutions / sonar-sslr-grappa

Read me first

What this is

About SSLR

Principle

Using a `LexerlessGrammarBuilder`

Using a `LexerfulGrammarBuilder`

What this package does instead

The grammar is fully declarative...

But you only have one channel

Advantages

Separation of concerns

Ease of debugging

Versatility

How this works

The parser

The (SSLR) parser factory

Technical notes...

About

Languages

Read me first

What this is

About SSLR

Principle

Using a LexerlessGrammarBuilder

Using a LexerfulGrammarBuilder

What this package does instead

The grammar is fully declarative...

But you only have one channel

Advantages

Separation of concerns

Ease of debugging

Versatility

How this works

The parser

The (SSLR) parser factory

Technical notes...

About

Languages

Using a `LexerlessGrammarBuilder`

Using a `LexerfulGrammarBuilder`