jdereg / json-io

Convert Java to JSON. Convert JSON to Java. Pretty print JSON. Java JSON serializer. Deep copy Java object graphs.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Doesn't work with non-ASCII characters on Windows

rednoah opened this issue · comments

Characters are not correctly decoded if file.encoding is not UTF-8. On Windows, this is not the case.

e.g. running this code with -Dfile.encoding=ISO-8859-1 and -Dfile.encoding=UTF-8 yields different results:

String s1 = "\"Die gelbe Hölle\"";
Object o = JsonReader.jsonToJava(s1);
String s2 = JsonWriter.objectToJson(o);

System.out.println(s1);
System.out.println(s2);
System.out.println(s1.equals(s2));

The JSON input should be converted to UTF-8 before passing it to a JSON parser. See this link for similar issue: http://stackoverflow.com/questions/3995559/json-character-encoding

In my test case, the input is a String object, so it can't have the wrong encoding. The problem is that jsonToJava doesn't work correctly on Windows (or generally speaking, if file.encoding is not set to UTF-8).

Presumably, the latest code is using String.getBytes() or new String(byte[]) somewhere which defaults to using file.encoding internally (and not necessarily UTF-8 as one might expect).

There are only two (2) calls to .getBytes() in json-io. Both of them use the 2-argument version, with the String "UTF-8" passed to them.

Well, I'm just telling you there's some severe issues with parsing non-ASCII characters that you might not be aware of because you're developing on Linux/Mac and never tested things on Windows.

Here's a complete test case. Run it yourself if you don't believe me. :P

test.groovy

@Grab('com.cedarsoftware:json-io:4.9.9')
import com.cedarsoftware.util.io.*
import java.nio.charset.*

println Charset.defaultCharset()

def s1 = /"Die gelbe Hölle"/
def o = JsonReader.jsonToJava(s1);
def s2 = JsonWriter.objectToJson(o);

println s1
println s2

assert s1 == s2

Works if default charset is UTF-8:

$ groovy test.groovy
UTF-8
"Die gelbe Hölle"
"Die gelbe Hölle"

DOES NOT work if charset is Windows-1252:

$ JAVA_OPTS="-Dfile.encoding=Windows-1252"; groovy test.groovy
windows-1252
"Die gelbe Hölle"
"Die gelbe Hölle"
Caught: Assertion failed:

assert s1 == s2
       |  |  |
       |  |  "Die gelbe Hölle"
       |  false
       "Die gelbe Hölle"

Assertion failed:

assert s1 == s2
       |  |  |
       |  |  "Die gelbe Hölle"
       |  false
       "Die gelbe Hölle"

	at test.run(test.groovy:14)

DOES NOT work if charset is ASCII:

$ JAVA_OPTS="-Dfile.encoding=ASCII"; groovy test.groovy
groovy test.groovy
US-ASCII
"Die gelbe H??lle"
"Die gelbe H??????lle"
Caught: Assertion failed:

assert s1 == s2
       |  |  |
       |  |  "Die gelbe H??????lle"
       |  false
       "Die gelbe H??lle"

Assertion failed:

assert s1 == s2
       |  |  |
       |  |  "Die gelbe H??????lle"
       |  false
       "Die gelbe H??lle"

	at test.run(test.groovy:14)

Thank you for providing a test case. I will have to dust off my old Windows PC and check it out.

You can debug on Mac or Linux. Just need to set file.encoding to something other than UTF-8.

Fixed and released in json-io 4.9.10.