openrewrite / rewrite

Automated mass refactoring of source code.

Home Page:https://docs.openrewrite.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Print equality issue with `<!-- -->` in JavaDoc

loetifuss opened this issue · comments

What problem are you trying to solve?

When refactoring large codebases chances are fairly high that a number of Java sources fail to be parsed because of invalid/unexpected tokens in Javadoc (see example below and this issue). Apart from printing out a single message on the console this goes fairly unnoticed and the recipe finishes despite some sources not being processed by the recipe.

Example parsing issue:

    @Test
    void dontFailParsing() {
        rewriteRun(
                spec -> spec.parser(JavaParser.fromJavaVersion()
                        .logCompilationWarningsAndErrors(true)),
                java(
                        """
                                package foo.bar;

                                /**
                                * This is a broken comment
                                * <!-- begin-user-doc --> bla <!-- end-user-doc
                                * -->
                                */
                                public class MyClass {}
                                """));
    }

Have you considered any alternatives or workarounds?

The only workaround I can think of would be to fix all issues with Javadoc parsing

Describe the solution you'd like

Provide an option (e.g. in the 'rewrite' dsl of the Gradle plugin) to skip parsing of Javadoc for source files.

Additional context

It seems there are more issues with the parsing of Javadoc:

hi! thanks for the runnable example and linking back to an earlier case. Rather than introduce a skip option for individual parsers we'd much rather fix the underlying parser issue. Did you already explore what kind of error you get running your test, and where that fails in the JavaDoc parser?

Running the test I only get a generic error message when Openrewrite asserts the printed source matches the original source code.

When parsing and printing the source code back to text without modifications, the printed source didn't match the original source code. This means there is a bug in the parser implementation itself. Please open an issue to report this, providing a sample of the code that generated this error for "foo\bar\MyClass.java":

Any hints on where I could set a breakpoint to explore?

Running the original recipe on our codebase using Gradle I only get messages like below, but running with --info doesn't print any stacktraces:

There were problems parsing some source files, run with --info to see full stack traces                        
There were problems parsing bla\src\foo\bar\SomeClass.java 

I've had a quick run here adding your unit test to JavadocTest

    @Test
    @Issue("https://github.com/openrewrite/rewrite/issues/4248")
    void brokenComment() {
        rewriteRun(
          spec -> spec.parser(JavaParser.fromJavaVersion()
            .logCompilationWarningsAndErrors(true)),
          java(
            """
              package foo.bar;

              /**
              * This is a broken comment
              * <!-- begin-user-doc --> bla <!-- end-user-doc
              * -->
              */
              public class MyClass {}
              """
          )
        );
    }

That indeed seems to stumble across this assertion built into the testing framework:

assertContentEquals(
sourceFile,
StringUtils.readFully(input.getSource(ctx), parser.getCharset(ctx)),
sourceFile.printAll(out.clone()),
"When parsing and printing the source code back to text without modifications, " +
"the printed source didn't match the original source code. This means there is a bug in the " +
"parser implementation itself. Please open an issue to report this, providing a sample of the " +
"code that generated this error for"
);

If you want to see what happens as the parser runs, then you can set a breakpoint on this method:

public Tree visitDocComment(DocCommentTree node, List<Javadoc> body) {
init();

From the test failure it seems we're adding a LineBreak that should not be there after the comment
image

image

A simpler repro case from #4345:

/**
 *  <!--
 Does this html-style comment cause a "is not print idempotent" parsing problem?
 -->
 */
public class FailsToParse {
}