toml-lang / toml

Tom's Obvious, Minimal Language

Home Page:https://toml.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unclear whether raw tab is allowed inside multi-line basic string

jorisvr opened this issue · comments

It is unclear from the specification whether raw tab characters are allowed inside multi-line basic strings.

On the one hand, the section Multi-line basic strings states that "All other whitespace and newline characters remain intact.". This suggests that all whitespace is allowed inside the string. And whitespace has already been defined as tabs and spaces.

On the other hand, the same section states that "Any Unicode character may be used except those that must be escaped ... (U+0000 to U+001F, U+007F)".

So which is it ? Allowed because whitespace, or forbidden because control character ?

The ABNF definition forbids raw tab characters inside multi-line basic strings, but the ABNF is currently not authoritative.

Nice catch @jorisvr! :)

I think we should clarify that not "all whitespace" is allowed unescaped.

Basic strings is designed to input chars which can't easy input, like control chars, use escape mark ( these marks created in computer epoch ).
Literal strings is designed to input the escape mark self, or content completely has no relationship with coding on computer.

So there is no difference between basic strings and literal strings about treating Tab -- Tab is a human visible charactor, like Space or ABC.... CR and LF is same to TAB on this, but used for file lines split, it's special only so. So the control-chars I think even should not be treated special. Because it's no reason and useless -- Unicode defined so mush control-type chars.

TOML is a config format, not a transfer format, I think it could ignore many popular limit, to be as simple as possible, just consider the practical reason.

Even though it's fairly easy to type a tab, it seems that these whitespace characters were overlooked in the definition of multi-line basic strings, and in fact needlessly forbidden in single-line basic strings.

Let's let tabs be free in all strings. The escape code \t is still alright in both types of basic strings.

Let's let tabs be free in all strings.

Done. :)