toml-lang / toml

Tom's Obvious, Minimal Language

Home Page:https://toml.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider allowing newlines in basic and literal strings

arp242 opened this issue · comments

I don't really see any reason why we should forbid:

key = "hello,
world"

Right now you need to use:

key = """hello,
world"""

While this isn't absolutely horrible, it's less obvious than just using " .. ".

The same would apply to '..' strings.

A \ at the end of the line would still escape the newline but no starting newline will be trimmed, which would be the only difference with multi-line strings.

We might want to disallow using it in key names though:

"key
name" = "lol?"

Or maybe just allow it?

  • It should be easy for parser to implement this.
  • It will make TOML more obvious and easier to use, as you can "just" use a newline in a "normal" string.

I can't really think of any serious downsides, but perhaps I'm missing something?

Thoughts?

One problem with allowing this is that it's not made clear what line endings to use. Say you had the following:

greeting = 'Hi
there'

That string value is equivalent to either "Hi\nthere" or "Hi\r\nthere", but without context, we're unsure which. Arguably it would be the same as if we used triple quotes and let the parsers handle line endings for us. But basic strings make these line ending codes explicit. I prefer that they remain that way.

Note that basic strings can explicitly describe each character of each string explicitly (and that includes non-normalized characters, if we were that picky). That is a design choice that I want to keep intact.

If the precision isn't necessary, then triple quotes are best for strings with multiple lines.

Let's not water these simple principles down.

So all that said, I was hoping that someone would write something in support of this idea. Because frankly, I don't hate it. We'd have two sets of quotes for basic and literal strings each, and multi-line strings would just be those with newlines in them. Simple and obvious.

Because of cross-platform ambiguity, I would forbid any stripe of multi-line strings for keys and table names. I mean, we could let it slide, but it should be obvious that multi-line keys are a bad idea. Parsers should prohibit them. Must they?

@arp242 You write:

A \ at the end of the line would still escape the newline but no starting newline will be trimmed, which would be the only difference with multi-line strings.

There's really no reason to have different functionality between " and """, regarding escaped line endings. Unless you intend to define what an escaped newline means in the context of " quotes.

I don't think this is particularly obvious that this is a good design choice.

If you need to restrict a certain flexibility you're arguing for providing, in a certain case, that's a symptom of the flexibility being provided potentially not being a good fit.

Additionally, I don't think it's particularly useful to relax this restriction on a string can span multiple lines. It's not a pattern in ~all popular programming languages, which all use a slightly different syntax for strings that span a line vs those that span multiple (if it's even supported).

Additionally, it's a stronger argument if we didn't already have a way to write multiple strings, but we do and it works sufficiently well at the cost of 4 extra characters and a bit more explicit denotation that the reader should expect a multi line input here.

Overall, I don't think this is a good idea and I'm going to go ahead and say that it's better for TOML to not evolve in this direction.