edn-format / edn

Extensible Data Notation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ambiguous string equality

jml opened this issue · comments

The edn spec says:

Strings ... [m]ay span multiple lines.

And later:

strings ... are equal to values of the same type with the same edn representation.

Does this mean that this edn expression:

"foo
bar"

Is not equal to this edn expression?:

"foo\nbar"

If not, how are implementors expected to read a string and then reliably print a string that's equal?

This came up writing an implementation in Python.

It depends on your newline encoding, doesn't it? edn is a text format. If your editor inserts a \r\n for a new line - thats it.

Well, yes, but that's not what I'm getting at. Even with normal UNIX encoding, "foo\nbar" and "foo bar" are different representations: one has a backslash in it, the other does not. Therefore, according to the spec, they are not equal, even with UNIX line encoding.

If you write

(= "foo\nbar" "foo
bar")

in a Clojure repl, it returns true. So yes it is a different representation or encoding of the same string.

Right, so the spec is incorrect (or at least, incomplete) when it says "nil, booleans, strings, characters, and symbols are equal to values of the same type with the same edn representation." (emphases mine)

The question is: that is meant by "edn representation"? I think it's save to assume that the string equality is based on the in-memory list of chars a string is made of. The encoding \n and the byte decimal 10 will lead to the same in-memory char. So the two strings are equal.

But you are right. The spec has to define that canonical representation of a string. One can't take Java Strings because it has to work on other platforms as well.

The spec says strings with same representation are equal. It does not say that different representations cannot be equal (e.g. an embedded newline and \n)