Avoid using non-ASCII Unicode characters outside of comments and literals

Question

Avoid using non-ASCII Unicode characters outside of comments and literals

codefish1 opened this issue 2 years ago · comments

In error-prone 2.11.0 I've started getting the following error when building within IntelliJ

Foo.java:17:2
java: [UnicodeInCode] Avoid using non-ASCII Unicode characters outside of comments and literals, as they can be confusing.
    (see https://errorprone.info/bugpattern/UnicodeInCode)

When I view the file in VIM or HexDump there I can't see any non-unicode characters.

Line 17 is the end of the file, I can't supply the whole file due to work constraints. But below is a screenshot of the end of the file from hexedit

Within IntelliJ the formatter is doing

If I down grade error-prone to 2.10.0 it works fine on the offending file

Liam Miller-Cushon · Answer 1 · Sat Apr 09 2022 01:29:30 GMT+0800 (China Standard Time)

I think I've seen this a couple of times and hadn't got to the bottom of it yet.

To make it easier to debug, maybe we should improve the diagnostic to mention which non-unicode characters it thinks it's seeing.

David Morris · Answer 2 · Sat Apr 09 2022 02:58:10 GMT+0800 (China Standard Time)

Playing with the existing test, to add an assertion on the error and I noticed it already outputs the line in error along with a ^ pointing at the character in error. But I don't get that in these cases

Thomas Broyer · Answer 3 · Sat Apr 09 2022 03:49:46 GMT+0800 (China Standard Time)

AFAICT, because 99.9% of Java code is plain ASCII, the check is rather "dumb" and doesn't try to only flag problematic chars.

David Morris · Answer 4 · Sat Apr 09 2022 05:28:53 GMT+0800 (China Standard Time)

I think it's a bug which appears when running in IntelliJ

Using a file which fails in IntelliJ (2021.3.2 (Ultimate Edition)) the following test using the command line from the installation docs works. In addition a mvn compile on the command line works

javac \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED \
  -XDcompilePolicy=simple \
  -processorpath error_prone_core-2.11.0-with-dependencies.jar:dataflow-errorprone-3.15.0.jar \
  '-Xplugin:ErrorProne -XepDisableWarningsInGeneratedCode -XepExcludedPaths:.*/target/generated-sources/.*' \ 
  filename.java

I've also copied the failing file to one side and done a diff to see it's the same as the failing one. Played about with the file a few times (adding and removing the last line) until it works and done a diff again. The diff shows no difference in the files.

Liam Miller-Cushon · Answer 5 · Thu Apr 14 2022 04:39:04 GMT+0800 (China Standard Time)

I wonder if IntelliJ is adding a unicode character to the buffer for some reason.

I'm going to update the diagnostic message to print the character it's seeing, which might help debug this.

Elena Felder · Answer 6 · Fri May 20 2022 23:11:37 GMT+0800 (China Standard Time)

FYI, there is an issue filed on the IntelliJ side, too -- https://youtrack.jetbrains.com/issue/IDEA-288257

Nikolay Chashnikov · Answer 7 · Thu Aug 11 2022 00:42:53 GMT+0800 (China Standard Time)

I've found the cause: Javac modifies content of file passed to it as char[] (see UnicodeReader.java:103) by replacing the last character by 0x1a. If this array is cached (the original implementation of Javac also does that, but code in intellij does this in a different way to improve performance), Error Prone may get this modified content and report an error. Note that this code in Javac was rewritten as part of JDK-8224225, so the problem shouldn't appear in Java 16 and newer versions.

Nikolay Chashnikov · Answer 8 · Thu Aug 11 2022 00:48:04 GMT+0800 (China Standard Time)

I'm not sure how we can fix this on intellij side. We implement javax.tools.FileObject#getCharContent and cache content of the returned CharSequence, it's really unexpected that code in Javac casts the returned value to CharBuffer and modifies its content. Maybe this can be fixed in Error Prone? I think ignoring 0x1a symbol if it's the last character in the file text is a good workaround, I doubt that any real problems will be masked by such change.

Larry White · Answer 9 · Fri Sep 30 2022 22:57:27 GMT+0800 (China Standard Time)

@chashnikov FWIW, I still have this issue in Java 18 (Zulu) in Intellij.

Larry White · Answer 10 · Fri Oct 14 2022 02:26:08 GMT+0800 (China Standard Time)

Since this has been merged but is still open, can someone update this with the version where the fix will appear?

Liam Miller-Cushon · Answer 11 · Fri Oct 14 2022 02:31:15 GMT+0800 (China Standard Time)

This should have been included in the recent 2.16.0 release

kenfreeman · Answer 12 · Wed Nov 09 2022 01:19:16 GMT+0800 (China Standard Time)

FYI, I still see this on occasion in 2.16. Seems to be less common.