google / error-prone

Catch common Java mistakes as compile-time errors

Home Page:https://errorprone.info

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Avoid using non-ASCII Unicode characters outside of comments and literals

codefish1 opened this issue · comments

In error-prone 2.11.0 I've started getting the following error when building within IntelliJ

Foo.java:17:2
java: [UnicodeInCode] Avoid using non-ASCII Unicode characters outside of comments and literals, as they can be confusing.
    (see https://errorprone.info/bugpattern/UnicodeInCode)

When I view the file in VIM or HexDump there I can't see any non-unicode characters.

Line 17 is the end of the file, I can't supply the whole file due to work constraints. But below is a screenshot of the end of the file from hexedit
image

Within IntelliJ the formatter is doing
image

If I down grade error-prone to 2.10.0 it works fine on the offending file

I think I've seen this a couple of times and hadn't got to the bottom of it yet.

To make it easier to debug, maybe we should improve the diagnostic to mention which non-unicode characters it thinks it's seeing.

Playing with the existing test, to add an assertion on the error and I noticed it already outputs the line in error along with a ^ pointing at the character in error. But I don't get that in these cases
Screenshot from 2022-04-08 19-56-01

AFAICT, because 99.9% of Java code is plain ASCII, the check is rather "dumb" and doesn't try to only flag problematic chars.

I think it's a bug which appears when running in IntelliJ

Using a file which fails in IntelliJ (2021.3.2 (Ultimate Edition)) the following test using the command line from the installation docs works. In addition a mvn compile on the command line works

javac \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED \
  -XDcompilePolicy=simple \
  -processorpath error_prone_core-2.11.0-with-dependencies.jar:dataflow-errorprone-3.15.0.jar \
  '-Xplugin:ErrorProne -XepDisableWarningsInGeneratedCode -XepExcludedPaths:.*/target/generated-sources/.*' \ 
  filename.java

I've also copied the failing file to one side and done a diff to see it's the same as the failing one. Played about with the file a few times (adding and removing the last line) until it works and done a diff again. The diff shows no difference in the files.

I wonder if IntelliJ is adding a unicode character to the buffer for some reason.

I'm going to update the diagnostic message to print the character it's seeing, which might help debug this.

FYI, there is an issue filed on the IntelliJ side, too -- https://youtrack.jetbrains.com/issue/IDEA-288257

I've found the cause: Javac modifies content of file passed to it as char[] (see UnicodeReader.java:103) by replacing the last character by 0x1a. If this array is cached (the original implementation of Javac also does that, but code in intellij does this in a different way to improve performance), Error Prone may get this modified content and report an error. Note that this code in Javac was rewritten as part of JDK-8224225, so the problem shouldn't appear in Java 16 and newer versions.

I'm not sure how we can fix this on intellij side. We implement javax.tools.FileObject#getCharContent and cache content of the returned CharSequence, it's really unexpected that code in Javac casts the returned value to CharBuffer and modifies its content. Maybe this can be fixed in Error Prone? I think ignoring 0x1a symbol if it's the last character in the file text is a good workaround, I doubt that any real problems will be masked by such change.

@chashnikov FWIW, I still have this issue in Java 18 (Zulu) in Intellij.

Since this has been merged but is still open, can someone update this with the version where the fix will appear?

This should have been included in the recent 2.16.0 release

FYI, I still see this on occasion in 2.16. Seems to be less common.