Column number in `DetailNode` should start with 1

Question

Column number in `DetailNode` should start with 1

MANISH-K-07 opened this issue 2 months ago · comments

Manish Krishna Kandrakota commented 2 months ago

I have downloaded the latest checkstyle from: https://checkstyle.org/cmdline.html#Download_and_Run
https://checkstyle.org/cmdline.html#Command_line_usage
I have executed the cli and showed it below, as cli describes the problem better than 1,000 words

E:\MANISH\Workspace>cat Test.java
/**
 * Some javadoc.
 *
 * @author MANISH-K-07
 * @version 1.0
 */
class Test {
        /*
          romani
          rnveach
          nrmancuso
        */
        // MANISH-K-07
}

E:\MANISH\Workspace>java -jar checkstyle-10.14.2-all.jar -J Test.java
COMPILATION_UNIT -> COMPILATION_UNIT [7:0]
`--CLASS_DEF -> CLASS_DEF [7:0]
    |--MODIFIERS -> MODIFIERS [7:0]
    |--BLOCK_COMMENT_BEGIN -> /* [1:0]
    |   |--COMMENT_CONTENT -> *\n * Some javadoc.\n *\n * @author MANISH-K-07\n * @version 1.0\n  [1:2]
    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:0]
    |   |       |--TEXT ->  Some javadoc. [2:2]
    |   |       |--NEWLINE -> \n [2:16]
    |   |       |--LEADING_ASTERISK ->  * [3:0]
    |   |       |--NEWLINE -> \n [3:2]
    |   |       |--LEADING_ASTERISK ->  * [4:0]
    |   |       |--WS ->   [4:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [4:3]
    |   |       |   |--AUTHOR_LITERAL -> @author [4:3]
    |   |       |   |--WS ->   [4:10]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [4:11]
    |   |       |       |--TEXT -> MANISH-K-07 [4:11]
    |   |       |       `--NEWLINE -> \n [4:22]
    |   |       |--LEADING_ASTERISK ->  * [5:0]
    |   |       |--WS ->   [5:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [5:3]
    |   |       |   |--VERSION_LITERAL -> @version [5:3]
    |   |       |   |--WS ->   [5:11]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [5:12]
    |   |       |       |--TEXT -> 1.0 [5:12]
    |   |       |       |--NEWLINE -> \n [5:15]
    |   |       |       `--TEXT ->   [6:0]
    |   |       `--EOF -> <EOF> [6:1]
    |   `--BLOCK_COMMENT_END -> */ [6:1]
    |--LITERAL_CLASS -> class [7:0]
    |--IDENT -> Test [7:6]
    `--OBJBLOCK -> OBJBLOCK [7:11]
        |--LCURLY -> { [7:11]
        |--BLOCK_COMMENT_BEGIN -> /* [8:1]
        |   |--COMMENT_CONTENT -> \n\t  romani\n\t  rnveach\n\t  nrmancuso\n\t [8:3]
        |   `--BLOCK_COMMENT_END -> */ [12:1]
        |--SINGLE_LINE_COMMENT -> // [13:1]
        |   `--COMMENT_CONTENT ->  MANISH-K-07\n [13:3]
        `--RCURLY -> } [14:0]

E:\MANISH\Workspace>

The de-sync of column numbers (-1) can be found at the javadoc content of the AST :

 * Some javadoc.
 *
 * @author MANISH-K-07
 * @version 1.0
 
    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:0]
    |   |       |--TEXT ->  Some javadoc. [2:2]
    |   |       |--NEWLINE -> \n [2:16]
    |   |       |--LEADING_ASTERISK ->  * [3:0]
    |   |       |--NEWLINE -> \n [3:2]
    |   |       |--LEADING_ASTERISK ->  * [4:0]
    |   |       |--WS ->   [4:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [4:3]
    |   |       |   |--AUTHOR_LITERAL -> @author [4:3]
    |   |       |   |--WS ->   [4:10]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [4:11]
    |   |       |       |--TEXT -> MANISH-K-07 [4:11]
    |   |       |       `--NEWLINE -> \n [4:22]
    |   |       |--LEADING_ASTERISK ->  * [5:0]
    |   |       |--WS ->   [5:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [5:3]
    |   |       |   |--VERSION_LITERAL -> @version [5:3]
    |   |       |   |--WS ->   [5:11]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [5:12]
    |   |       |       |--TEXT -> 1.0 [5:12]
    |   |       |       |--NEWLINE -> \n [5:15]
    |   |       |       `--TEXT ->   [6:0]
    |   |       `--EOF -> <EOF> [6:1]

Leading asterix is being considered as position '0' because the javadoc is parsed as a separate file.
This is why column numbers shown in the AST generated through cli option -J, -- treeWithJavadoc differ by 1 for javadoc content.

The option works in a way to sync the line numbers accurately. This should be extended to column numbers too
We should expect an AST something like :

    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:1]
    |   |       |--TEXT ->  Some javadoc. [2:3]
    |   |       |--NEWLINE -> \n [2:16]

Update: Similar to #4997 , but for `DetailNode`.

Manish Krishna Kandrakota · Answer 1 · Thu Apr 11 2024 16:25:58 GMT+0800 (China Standard Time)

This issue in response to @rnveach 's #14710 (comment) which IMO is true considering the fact that the cmdline docs say that both the line and column numbers "will" differ. This however proved false as the line no's are in perfect sync.
The docs have been updated as part of #14710 but since we are able to sync lines, the same could be extended to columns.

@rnveach , @romani , I couldn't understand how just each newline (\n) of javadoc is in perfect sync with the column numbers?

PS: the column numbering in IDEs would exceed by 1 bcoz AST displays 0 based positioning

Roman Ivanov · Answer 2 · Thu Apr 11 2024 21:33:39 GMT+0800 (China Standard Time)

There should be no 0 column at all, all should start from 1.

$ java -jar checkstyle-10.15.0-all.jar -J Test.java
COMPILATION_UNIT -> COMPILATION_UNIT [7:0]
`--CLASS_DEF -> CLASS_DEF [7:0]
    |--MODIFIERS -> MODIFIERS [7:0]
    |--BLOCK_COMMENT_BEGIN -> /* [1:0]
    |   |--COMMENT_CONTENT -> *\n * Some javadoc.\n *\n * @author MANISH-K-07\n * @version 1.0\n  [1:2]
    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:0]

even COMPILATION_UNIT should be [7:1]

Manish Krishna Kandrakota · Answer 3 · Thu Apr 11 2024 21:55:51 GMT+0800 (China Standard Time)

There should be no 0 column at all, all should start from 1.
even COMPILATION_UNIT should be [7:1]

Ideally yes @romani , to avoid confusion this would help a lot..
But for some reason, AST column numbering starts from 0 conventionally.
https://www.geeksforgeeks.org/abstract-syntax-tree-ast-in-java/

I couldn't understand how just each newline (\n) of javadoc is in perfect sync with the column numbers?

Could you please tell me why this is observed?

Roman Ivanov · Answer 4 · Thu Apr 11 2024 22:11:19 GMT+0800 (China Standard Time)

Could you please tell me why this is observed?

no idea. Somebody need to debug it.

But for some reason, AST column numbering starts from 0 conventionally.

we should start from 1, to match what user is seeing in all IDE and text editors.

Manish Krishna Kandrakota · Answer 5 · Thu Apr 11 2024 22:20:14 GMT+0800 (China Standard Time)

no idea. Somebody need to debug it.

Will do. The parser is antlr based I believe.... so maybe @nrmancuso , any insights on why this might be happening?

we should start from 1, to match what user is seeing in all IDE and text editors.

Totally agree with this. It would help a lot to avoid all the confusion

rnveach · Answer 6 · Fri Apr 12 2024 05:56:29 GMT+0800 (China Standard Time)

There should be no 0 column at all, all should start from 1.
even COMPILATION_UNIT should be [7:1]

This is issue #4997 . It should be slightly separate from this issue as we have a de-sync with the current numbers.

Once they are insync, if position should be upped so there is no 0, then it should fall under #4997.

Manish Krishna Kandrakota · Answer 7 · Fri Apr 12 2024 10:02:54 GMT+0800 (China Standard Time)

I will look into this bug, but antlr is not really my stronghold. Could someone please give me an overview of the approach here?

Nick Mancuso · Answer 8 · Wed Apr 17 2024 21:48:45 GMT+0800 (China Standard Time)

I am removing the approved label until we have a clear picture of the scope of the work to be done here.

As I read through the description and comments:

The option works in a way to sync the line numbers accurately

antlr

It is not clear to me if the problem is in our ANTLR grammar, JavaParser, or in the CLI.

What I need to see here to approve this again:

Explanation of where/how we currently mutate line numbers as mentioned above, with links.
All details in PR description
New framing of issue entirely; it sounds like we are going to impact the entire Javadoc AST, not just the command line option. This means that checks, filters, etc. are impacted. We need to make this very clear.
Understand that this is a breaking change, and consider how this can impact users.

Roman Ivanov · Answer 9 · Thu Apr 18 2024 07:34:11 GMT+0800 (China Standard Time)

Doc should show how it works now, even it is weird.
We can create issue to improve and in scope of improvement doc will be updated too. No need to block all together in one update.

Nick Mancuso · Answer 10 · Thu Apr 18 2024 20:08:58 GMT+0800 (China Standard Time)

Doc should show how it works now, even it is weird. We can create issue to improve and in scope of improvement doc will be updated too. No need to block all together in one update.

Please restate this, sounds like we still have a lot of confusion about the problem and how we are proposing to fix it. This comment leads me to believe that you think this is a documentation/CLI issue, while @MANISH-K-07 is ready to start making changes to the ANTLR grammar:

I will look into this bug, but antlr is not really my stronghold. Could someone please give me an overview of the approach here?

My comment at #14780 (comment) stands.

Manish Krishna Kandrakota · Answer 11 · Thu Apr 18 2024 20:28:50 GMT+0800 (China Standard Time)

@nrmancuso , I wasn't ready to do any update here, precisely why I pulled you to help me with this (#14780 (comment))

@nrmancuso , any insights on why this might be happening?

As of now, I just have basic idea of what the problem we are facing is.. not sure on why and how part :(

Nick Mancuso · Answer 12 · Thu Apr 18 2024 20:34:17 GMT+0800 (China Standard Time)

not sure on why and how part

Yes, this is why you need to debug and research, then provide your findings here before we approve and you start writing code. Always expect to do this for issues of any appreciable value and scope. Writing code is the easiest part; truly understanding the issue is a big part of the work, and helping others to understand is another important aspect.

Think about it like this: you start writing code, testing, etc, then send a PR only to find out that there is a massive misunderstanding, then you have to do it all over again. Then the next reviewer comes, points out a bunch of issues with your approach, then rinse and repeat. It is better to do a lot of work in the issue with research/explanations/discussion, since this work is reusable and ensures that everyone is on the same page.

Manish Krishna Kandrakota · Answer 13 · Thu Apr 18 2024 20:43:37 GMT+0800 (China Standard Time)

Sure, thanks for the response..
Might take some time, but I will get back on this soon :)

rnveach · Answer 14 · Mon Apr 22 2024 03:06:34 GMT+0800 (China Standard Time)

COMPILATION_UNIT -> COMPILATION_UNIT [7:0]
| | |--LEADING_ASTERISK -> * [2:0]

I haven't looked into this much, but this seems more like an antlr issue, or our conversion of antlr to AST.

When we create ASTs, we do a direct conversion from antlr's type. Debugging this code, shows we are giving column 0. So we take that as is, with no change.

AST is provided and used for everything, so when we provide column 0, everything else receives column 0.

Things are probably made more complex as violations possibly seem to be 1 based. This is because we add a 1 in AbstractCheck's log methods. These +1s were added as part of #4998 . Javadoc's logging uses similar log methods from AbstractCheck which have the same +1.

As another example, when don't provide a column number to Violation class, 0 is used as the default.

Also ASTs don't know about tab widths, so column number in the AST is not truly the column number. It is more the character index in the line. This is probably why ANTLR calls it's method getCharPositionInLine. It's javadoc says The index of the first character of this token relative to the beginning of the line at which it occurs, 0..n-1 which supports what I said above.

Fixing this issue will probably require little fixes all over the place.

rnveach · Answer 15 · Mon Apr 22 2024 03:18:15 GMT+0800 (China Standard Time)

@MANISH-K-07 Unless I am missing something, this issue doesn't seem like a desync, but is more Column number in DetailNode should start with 1.

Until #4997 is fixed, all our ASTs are 0-based. Your first post tree shows this for both ASTs. So they are technically in sync, right? If there a de-sync between violations and ASTs, this follows what I am saying in my previous post, which still stems from our ASTs are 0-based.

Manish Krishna Kandrakota · Answer 16 · Mon Apr 22 2024 09:16:27 GMT+0800 (China Standard Time)

but this seems more like an antlr issue, or our conversion of antlr to AST.

I will look into this bug, but antlr is not really my stronghold.

That was my thought, yes.

@MANISH-K-07 Unless I am missing something, this issue doesn't seem like a desync, but is more Column number in DetailNode should start with 1.

@rnveach , well I wouldn't say this issue is entirely #4997 but actually, this is slightly dependent on it.

I say "slightly dependent" coz the 0-based numbering of AST reflects even to javadoc AST that we parse as a separate file but
the main point of this issue is how the javadoc content of AST goes out of usual and syncs perfectly with line no.s but not with column no.s. So yes, the column numbers would up by 1 but even then, we would be printing javadoc chunk wrong.

So they are technically in sync, right?

Actually, No.
Continuing my explanation from above, there is de-sync and it is happening "only" from the position of leading asterix.
As of now, while we consider 0-based, * should be position 1. After #4997 , if we transit to 1-based, * should be position 2.

 * Some javadoc.                  
 *                              
 * @author MANISH-K-07 
 * @version 1.0

To be precise, the leading asterix is being parsed as starting position (0 as of now, 1 after #4997 ). Somehow, this numbering is not observed continuing to newline chars (#14780 (comment))

@rnveach , @romani , I couldn't understand how just each newline (\n) of javadoc is in perfect sync with the column numbers?

So IMO, even if we fix 0-based column numbering to up by 1, we will still see leading asterix being parsed as position 1 therefore resulting in de-sync again.

Please do correct me If I'm wrong anywhere....

Column number in `DetailNode` should start with 1

Update: Similar to #4997 , but for DetailNode.

Update: Similar to #4997 , but for `DetailNode`.