toml-lang / toml-test

A language agnostic test suite for TOML parsers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Invalid Unicode Escapes

cyanskies opened this issue · comments

The following test contain invalid unicode escapes:

  • tests/valid/key/quoted-unicode.json
  • tests/valid/string/quoted-unicode.json

Both tests contain strings containing this sequence: "\ud800\udc00 \udbff\udfff"
All four of these escape codes fall outside the unicode scalar values. I suspect they're supposed to be \UXXXXXXXX style escapes that have been generated incorrectly.

Are you sure that's invalid? I think that's just how JSON works because it's always in UTF-16 or something, but I'd have to read the spec to be sure.

e.g. \U0010ffff in TOML is 0xdb 0xff 0xdf 0xff in UTF-16 BE, and that fits with the \udbff\udfff in the JSON.

Which language/JSON parser are you using for this? Most languages seem to work fine with these escapes, but I'm always open to changing something if it improves compatibility.

I guess it's this one: cyanskies/another-toml-cpp#11

Maybe I'm missing something, but I don't see a way to run the tests?

I run the tests using this repo https://github.com/cyanskies/another-toml-test
It builds encoder and decoder executables that I test using the precompiled toml-test executable.

I'm using an in-tree copy of SimpleJSON.
I was assuming that JSON was in utf8 and passing the string across directly, so it might be my mistake then.

From https://datatracker.ietf.org/doc/html/rfc8259 :

To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a 12-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".

So seems the behaviour is correct.

That SimpleJSON hasn't been updated since 2016. Maybe it's bugged? Your project doesn't compile for me.

toml++ uses https://github.com/nlohmann/json: https://github.com/marzer/tomlplusplus/tree/master/vendor

toml11 has some thing they wrote themselves: https://github.com/ToruNiina/toml11/blob/master/tests/check_toml_test.cpp