checkstyle / checkstyle

Checkstyle is a development tool to help programmers write Java code that adheres to a coding standard. By default it supports the Google Java Style Guide and Sun Code Conventions, but is highly configurable. It can be invoked with an ANT task and a command line program.

Home Page:https://checkstyle.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Column number in `DetailNode` should start with 1

MANISH-K-07 opened this issue · comments

I have downloaded the latest checkstyle from: https://checkstyle.org/cmdline.html#Download_and_Run
https://checkstyle.org/cmdline.html#Command_line_usage
I have executed the cli and showed it below, as cli describes the problem better than 1,000 words

E:\MANISH\Workspace>cat Test.java
/**
 * Some javadoc.
 *
 * @author MANISH-K-07
 * @version 1.0
 */
class Test {
        /*
          romani
          rnveach
          nrmancuso
        */
        // MANISH-K-07
}

E:\MANISH\Workspace>java -jar checkstyle-10.14.2-all.jar -J Test.java
COMPILATION_UNIT -> COMPILATION_UNIT [7:0]
`--CLASS_DEF -> CLASS_DEF [7:0]
    |--MODIFIERS -> MODIFIERS [7:0]
    |--BLOCK_COMMENT_BEGIN -> /* [1:0]
    |   |--COMMENT_CONTENT -> *\n * Some javadoc.\n *\n * @author MANISH-K-07\n * @version 1.0\n  [1:2]
    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:0]
    |   |       |--TEXT ->  Some javadoc. [2:2]
    |   |       |--NEWLINE -> \n [2:16]
    |   |       |--LEADING_ASTERISK ->  * [3:0]
    |   |       |--NEWLINE -> \n [3:2]
    |   |       |--LEADING_ASTERISK ->  * [4:0]
    |   |       |--WS ->   [4:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [4:3]
    |   |       |   |--AUTHOR_LITERAL -> @author [4:3]
    |   |       |   |--WS ->   [4:10]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [4:11]
    |   |       |       |--TEXT -> MANISH-K-07 [4:11]
    |   |       |       `--NEWLINE -> \n [4:22]
    |   |       |--LEADING_ASTERISK ->  * [5:0]
    |   |       |--WS ->   [5:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [5:3]
    |   |       |   |--VERSION_LITERAL -> @version [5:3]
    |   |       |   |--WS ->   [5:11]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [5:12]
    |   |       |       |--TEXT -> 1.0 [5:12]
    |   |       |       |--NEWLINE -> \n [5:15]
    |   |       |       `--TEXT ->   [6:0]
    |   |       `--EOF -> <EOF> [6:1]
    |   `--BLOCK_COMMENT_END -> */ [6:1]
    |--LITERAL_CLASS -> class [7:0]
    |--IDENT -> Test [7:6]
    `--OBJBLOCK -> OBJBLOCK [7:11]
        |--LCURLY -> { [7:11]
        |--BLOCK_COMMENT_BEGIN -> /* [8:1]
        |   |--COMMENT_CONTENT -> \n\t  romani\n\t  rnveach\n\t  nrmancuso\n\t [8:3]
        |   `--BLOCK_COMMENT_END -> */ [12:1]
        |--SINGLE_LINE_COMMENT -> // [13:1]
        |   `--COMMENT_CONTENT ->  MANISH-K-07\n [13:3]
        `--RCURLY -> } [14:0]

E:\MANISH\Workspace>

The de-sync of column numbers (-1) can be found at the javadoc content of the AST :

 * Some javadoc.
 *
 * @author MANISH-K-07
 * @version 1.0
 
    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:0]
    |   |       |--TEXT ->  Some javadoc. [2:2]
    |   |       |--NEWLINE -> \n [2:16]
    |   |       |--LEADING_ASTERISK ->  * [3:0]
    |   |       |--NEWLINE -> \n [3:2]
    |   |       |--LEADING_ASTERISK ->  * [4:0]
    |   |       |--WS ->   [4:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [4:3]
    |   |       |   |--AUTHOR_LITERAL -> @author [4:3]
    |   |       |   |--WS ->   [4:10]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [4:11]
    |   |       |       |--TEXT -> MANISH-K-07 [4:11]
    |   |       |       `--NEWLINE -> \n [4:22]
    |   |       |--LEADING_ASTERISK ->  * [5:0]
    |   |       |--WS ->   [5:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [5:3]
    |   |       |   |--VERSION_LITERAL -> @version [5:3]
    |   |       |   |--WS ->   [5:11]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [5:12]
    |   |       |       |--TEXT -> 1.0 [5:12]
    |   |       |       |--NEWLINE -> \n [5:15]
    |   |       |       `--TEXT ->   [6:0]
    |   |       `--EOF -> <EOF> [6:1]


Leading asterix is being considered as position '0' because the javadoc is parsed as a separate file.
This is why column numbers shown in the AST generated through cli option -J, -- treeWithJavadoc differ by 1 for javadoc content.

The option works in a way to sync the line numbers accurately. This should be extended to column numbers too
We should expect an AST something like :

    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:1]
    |   |       |--TEXT ->  Some javadoc. [2:3]
    |   |       |--NEWLINE -> \n [2:16]

Update: Similar to #4997 , but for DetailNode.

This issue in response to @rnveach 's #14710 (comment) which IMO is true considering the fact that the cmdline docs say that both the line and column numbers "will" differ. This however proved false as the line no's are in perfect sync.
The docs have been updated as part of #14710 but since we are able to sync lines, the same could be extended to columns.

@rnveach , @romani , I couldn't understand how just each newline (\n) of javadoc is in perfect sync with the column numbers?

PS: the column numbering in IDEs would exceed by 1 bcoz AST displays 0 based positioning

There should be no 0 column at all, all should start from 1.

$ java -jar checkstyle-10.15.0-all.jar -J Test.java
COMPILATION_UNIT -> COMPILATION_UNIT [7:0]
`--CLASS_DEF -> CLASS_DEF [7:0]
    |--MODIFIERS -> MODIFIERS [7:0]
    |--BLOCK_COMMENT_BEGIN -> /* [1:0]
    |   |--COMMENT_CONTENT -> *\n * Some javadoc.\n *\n * @author MANISH-K-07\n * @version 1.0\n  [1:2]
    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:0]

even COMPILATION_UNIT should be [7:1]

There should be no 0 column at all, all should start from 1.
even COMPILATION_UNIT should be [7:1]

Ideally yes @romani , to avoid confusion this would help a lot..
But for some reason, AST column numbering starts from 0 conventionally.
https://www.geeksforgeeks.org/abstract-syntax-tree-ast-in-java/

I couldn't understand how just each newline (\n) of javadoc is in perfect sync with the column numbers?

Could you please tell me why this is observed?

Could you please tell me why this is observed?

no idea. Somebody need to debug it.

But for some reason, AST column numbering starts from 0 conventionally.

we should start from 1, to match what user is seeing in all IDE and text editors.

no idea. Somebody need to debug it.

Will do. The parser is antlr based I believe.... so maybe @nrmancuso , any insights on why this might be happening?

we should start from 1, to match what user is seeing in all IDE and text editors.

Totally agree with this. It would help a lot to avoid all the confusion

There should be no 0 column at all, all should start from 1.
even COMPILATION_UNIT should be [7:1]

This is issue #4997 . It should be slightly separate from this issue as we have a de-sync with the current numbers.

Once they are insync, if position should be upped so there is no 0, then it should fall under #4997.

I will look into this bug, but antlr is not really my stronghold. Could someone please give me an overview of the approach here?

I am removing the approved label until we have a clear picture of the scope of the work to be done here.

As I read through the description and comments:

The option works in a way to sync the line numbers accurately

antlr

It is not clear to me if the problem is in our ANTLR grammar, JavaParser, or in the CLI.

What I need to see here to approve this again:

  1. Explanation of where/how we currently mutate line numbers as mentioned above, with links.
  2. All details in PR description
  3. New framing of issue entirely; it sounds like we are going to impact the entire Javadoc AST, not just the command line option. This means that checks, filters, etc. are impacted. We need to make this very clear.
  4. Understand that this is a breaking change, and consider how this can impact users.

Doc should show how it works now, even it is weird.
We can create issue to improve and in scope of improvement doc will be updated too. No need to block all together in one update.

Doc should show how it works now, even it is weird. We can create issue to improve and in scope of improvement doc will be updated too. No need to block all together in one update.

Please restate this, sounds like we still have a lot of confusion about the problem and how we are proposing to fix it. This comment leads me to believe that you think this is a documentation/CLI issue, while @MANISH-K-07 is ready to start making changes to the ANTLR grammar:

I will look into this bug, but antlr is not really my stronghold. Could someone please give me an overview of the approach here?

My comment at #14780 (comment) stands.

@nrmancuso , I wasn't ready to do any update here, precisely why I pulled you to help me with this (#14780 (comment))

@nrmancuso , any insights on why this might be happening?

As of now, I just have basic idea of what the problem we are facing is.. not sure on why and how part :(

not sure on why and how part

Yes, this is why you need to debug and research, then provide your findings here before we approve and you start writing code. Always expect to do this for issues of any appreciable value and scope. Writing code is the easiest part; truly understanding the issue is a big part of the work, and helping others to understand is another important aspect.

Think about it like this: you start writing code, testing, etc, then send a PR only to find out that there is a massive misunderstanding, then you have to do it all over again. Then the next reviewer comes, points out a bunch of issues with your approach, then rinse and repeat. It is better to do a lot of work in the issue with research/explanations/discussion, since this work is reusable and ensures that everyone is on the same page.

Sure, thanks for the response..
Might take some time, but I will get back on this soon :)

COMPILATION_UNIT -> COMPILATION_UNIT [7:0]
| | |--LEADING_ASTERISK -> * [2:0]

I haven't looked into this much, but this seems more like an antlr issue, or our conversion of antlr to AST.

When we create ASTs, we do a direct conversion from antlr's type. Debugging this code, shows we are giving column 0. So we take that as is, with no change.

AST is provided and used for everything, so when we provide column 0, everything else receives column 0.

Things are probably made more complex as violations possibly seem to be 1 based. This is because we add a 1 in AbstractCheck's log methods. These +1s were added as part of #4998 . Javadoc's logging uses similar log methods from AbstractCheck which have the same +1.

As another example, when don't provide a column number to Violation class, 0 is used as the default.

Also ASTs don't know about tab widths, so column number in the AST is not truly the column number. It is more the character index in the line. This is probably why ANTLR calls it's method getCharPositionInLine. It's javadoc says The index of the first character of this token relative to the beginning of the line at which it occurs, 0..n-1 which supports what I said above.

Fixing this issue will probably require little fixes all over the place.

@MANISH-K-07 Unless I am missing something, this issue doesn't seem like a desync, but is more Column number in DetailNode should start with 1.

Until #4997 is fixed, all our ASTs are 0-based. Your first post tree shows this for both ASTs. So they are technically in sync, right? If there a de-sync between violations and ASTs, this follows what I am saying in my previous post, which still stems from our ASTs are 0-based.

but this seems more like an antlr issue, or our conversion of antlr to AST.

I will look into this bug, but antlr is not really my stronghold.

That was my thought, yes.

@MANISH-K-07 Unless I am missing something, this issue doesn't seem like a desync, but is more Column number in DetailNode should start with 1.

@rnveach , well I wouldn't say this issue is entirely #4997 but actually, this is slightly dependent on it.

I say "slightly dependent" coz the 0-based numbering of AST reflects even to javadoc AST that we parse as a separate file but
the main point of this issue is how the javadoc content of AST goes out of usual and syncs perfectly with line no.s but not with column no.s. So yes, the column numbers would up by 1 but even then, we would be printing javadoc chunk wrong.

So they are technically in sync, right?

Actually, No.
Continuing my explanation from above, there is de-sync and it is happening "only" from the position of leading asterix.
As of now, while we consider 0-based, * should be position 1. After #4997 , if we transit to 1-based, * should be position 2.

 * Some javadoc.                  
 *                              
 * @author MANISH-K-07 
 * @version 1.0         

To be precise, the leading asterix is being parsed as starting position (0 as of now, 1 after #4997 ). Somehow, this numbering is not observed continuing to newline chars (#14780 (comment))

@rnveach , @romani , I couldn't understand how just each newline (\n) of javadoc is in perfect sync with the column numbers?

So IMO, even if we fix 0-based column numbering to up by 1, we will still see leading asterix being parsed as position 1 therefore resulting in de-sync again.

Please do correct me If I'm wrong anywhere....