Doesn't work with non-ASCII characters on Windows
rednoah opened this issue · comments
Characters are not correctly decoded if file.encoding
is not UTF-8
. On Windows, this is not the case.
e.g. running this code with -Dfile.encoding=ISO-8859-1
and -Dfile.encoding=UTF-8
yields different results:
String s1 = "\"Die gelbe Hölle\"";
Object o = JsonReader.jsonToJava(s1);
String s2 = JsonWriter.objectToJson(o);
System.out.println(s1);
System.out.println(s2);
System.out.println(s1.equals(s2));
The JSON input should be converted to UTF-8 before passing it to a JSON parser. See this link for similar issue: http://stackoverflow.com/questions/3995559/json-character-encoding
In my test case, the input is a String
object, so it can't have the wrong encoding. The problem is that jsonToJava
doesn't work correctly on Windows (or generally speaking, if file.encoding
is not set to UTF-8
).
Presumably, the latest code is using String.getBytes()
or new String(byte[])
somewhere which defaults to using file.encoding
internally (and not necessarily UTF-8
as one might expect).
There are only two (2) calls to .getBytes() in json-io. Both of them use the 2-argument version, with the String "UTF-8" passed to them.
Well, I'm just telling you there's some severe issues with parsing non-ASCII characters that you might not be aware of because you're developing on Linux/Mac and never tested things on Windows.
Here's a complete test case. Run it yourself if you don't believe me. :P
test.groovy
@Grab('com.cedarsoftware:json-io:4.9.9')
import com.cedarsoftware.util.io.*
import java.nio.charset.*
println Charset.defaultCharset()
def s1 = /"Die gelbe Hölle"/
def o = JsonReader.jsonToJava(s1);
def s2 = JsonWriter.objectToJson(o);
println s1
println s2
assert s1 == s2
Works if default charset is UTF-8
:
$ groovy test.groovy
UTF-8
"Die gelbe Hölle"
"Die gelbe Hölle"
DOES NOT work if charset is Windows-1252
:
$ JAVA_OPTS="-Dfile.encoding=Windows-1252"; groovy test.groovy
windows-1252
"Die gelbe Hölle"
"Die gelbe Hölle"
Caught: Assertion failed:
assert s1 == s2
| | |
| | "Die gelbe Hölle"
| false
"Die gelbe Hölle"
Assertion failed:
assert s1 == s2
| | |
| | "Die gelbe Hölle"
| false
"Die gelbe Hölle"
at test.run(test.groovy:14)
DOES NOT work if charset is ASCII
:
$ JAVA_OPTS="-Dfile.encoding=ASCII"; groovy test.groovy
groovy test.groovy
US-ASCII
"Die gelbe H??lle"
"Die gelbe H??????lle"
Caught: Assertion failed:
assert s1 == s2
| | |
| | "Die gelbe H??????lle"
| false
"Die gelbe H??lle"
Assertion failed:
assert s1 == s2
| | |
| | "Die gelbe H??????lle"
| false
"Die gelbe H??lle"
at test.run(test.groovy:14)
Thank you for providing a test case. I will have to dust off my old Windows PC and check it out.
You can debug on Mac or Linux. Just need to set file.encoding
to something other than UTF-8
.
Fixed and released in json-io 4.9.10.