toml-lang / toml

Tom's Obvious, Minimal Language

Home Page:https://toml.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Proposal: Allow newlines and trailing commas in inline tables

JelteF opened this issue · comments

Overall I really like toml and its syntax feels very obvious to me for the most part. The only thing that doesn't is the expclicit cripling of inline tables, i.e. inline tables cannot have newlines or trailing commas. I've read the reasoning behind this in the existing issue and PR. However, I don't think that the reason given (discouraging people from using big inline tables instead of sections) weighs up against the downsides. That's why I would like to open up a discussion about this.

There's three main downsides I see:

  1. It's unexpected for most people using the language. Most popular languages that have {} style mappings allow newlines in them (JSON, Python, Javascript, Go). Also newlines and trailing commas are allowed in lists in the toml spec, so it is inconsistent in this regard.
  2. To me even small inline tables are much more readable at first glance when split over multiple lines:
# Single line
person = { name = "Tom", geography = { lat = 1.0, lon = 2.0 } }

# Multi line
person = { 
    name = "Tom", 
    geography = { 
        lat = 1.0, 
        lon = 2.0,
    },
}
  1. A deeply nested list of tables are forced to have a lot of repeated keys in the section headers. Compare this version with list of tables:
[main_app.general_settings.logging]
log-lib = "logrus"

[[main_app.general_settings.logging.handlers]]
  name = "default"
  output = "stdout"
  level = "info"

[[main_app.general_settings.logging.handlers]]
  name = "stderr"
  output = "stderr"
  level = "error"

[[main_app.general_settings.logging.handlers]]
  name = "access"
  output = "/var/log/access.log"
  level = "info"

To the one with inline tables with newlines:

[main_app.general_settings.logging]
log-lib = "logrus"

handlers = [
    {
        name = "default",
        output =  "stdout",
        level = "info",
    }, {
        name = "stderr",
        output =  "stderr",
        level = "error",
    }, {
        name = "access",
        output =  "/var/log/access.log",
        level = "info",
    },
]

Finally, extending current toml parsers to support this is usually really easy, so that also shouldn't be an argument against it. I changed the the https://github.com/pelletier/go-toml implementation to support newlines in tables and I only had to change 5 lines to do it (3 of which I simply had to delete).

Maybe I'm off-base, but I'm not yet sold on this proposal.

Regarding the first point, considering that arrays and tables are different things, the perceived inconsistency in syntax is not a problem, is perfectly acceptable, and sets these different things off nicely.

Skipping down to point 3, isn't the following equivalent? It's already legal TOML, it's readable, and it's space-efficient, or so I like to think. You may disagree with me (especially since I swapped two of the keys) but at least take a look:

[main_app.general_settings.logging]
log-lib = "logrus"

handlers = [
    {name = "default", level = "info",  output = "stdout"},
    {name = "stderr",  level = "error", output = "stderr"},
    {name = "access",  level = "info",  output = "/var/log/access.log"},
]

Also consider this. Any more readable?

[person]
name = "Tom"
geography = {lat = 1.0, lon = 2.0}

Inline tables are fully intended to be small tables, with multiple key/value pairs on one line. If the tables in your (quite readable) example were any larger, then double-bracket notation would make much more sense, even with repeated keys, and you'd get the one-line-one-pair that you seem to find aesthetically appealing.

In either case, we don't need to add a pseudo-JSON to get readability, no matter whether it would be simple to implement.

@eksortso I can see where you're coming from on the first point. I do disagree though, because IMHO they are very much similar because they're both inline datastructures. I don't know how to make that argument more convincing though. I think my main point there is: both the ararys and the inline tables have pseudo-JSON syntax, but the inline tables are missing some features right now that you would expect coming from JSON (or any other language that has similar syntax).

On the other two examples you make some good points. I took the second example because it was mentioned in the original issue. I see now though that there was discussion on that issue if it was even a good example.

The third one is an issue I actually have myself with my configs. I think you made some good points there as well. Especially the aligning of keys in the third one helps quite a lot with readability. I do think you indeed cheated in a smart way a bit by moving the keys around a bit. I'll will expand on that point and hopefully make my arguments there a bit stronger, but of course you're still allowed to disagree:

Modified point 3

I'll show the same piece of config in different ways below and list some of disadvantages and advantages with each one.

With double brackets

[main_app.general_settings.logging]
log-lib = "logrus"

[[main_app.general_settings.logging.handlers]]
  name = "default"
  output = "stdout"
  level = "info"

[[main_app.general_settings.logging.handlers]]
  name = "stderr"
  output = "stderr"
  level = "error"

[[main_app.general_settings.logging.handlers]]
  name = "http-access"
  output = "/var/log/access.log"
  level = "info"

[[main_app.general_settings.logging.loggers]]
  name = "default"
  handlers = ["default", "stderr"]
  level = "warning"

[[main_app.general_settings.logging.loggers]]
  name = "http-access"
  handlers = ["default"]
  level = "info"

Advantages:

  • Diffs are extremely clear, a line changed means that value changed.
  • Short lines

Disadvantages:

  • Lot's of times repeated main_app.general_settings.logging
  • Hard to see at first glance that there's two distinct arrays handlers and loggers
  • Quite a lot of vertical space is taken

With inline tables unaligned

[main_app.general_settings.logging]
log-lib = "logrus"

handlers = [
    {name = "default", output = "stdout", level = "info"},
    {name = "stderr", output = "stderr", level = "error"},
    {name = "http-access", output = "/var/log/access.log", level = "info"},
]
loggers = [
    {name = "default", handlers = ["default", "stderr"], level = "warning"}, 
    {name = "http-access", handlers = ["http-access"], level = "info"},
]

Advantages:

  • Very little vertical space is used

Disadvantages:

  • Looks messy, which makes it hard to compare the different tables in a single list.
  • Line based diffs don't show easily what value changed.

With inline tables without newlines without reordered keys

[main_app.general_settings.logging]
log-lib = "logrus"

handlers = [
    {name = "default",     output = "stdout",              level = "info"},
    {name = "stderr",      output = "stderr",              level = "error"},
    {name = "http-access", output = "/var/log/access.log", level = "info"},
]
loggers = [
    {name = "default",     handlers = ["default", "stderr"], level = "warning"}, 
    {name = "http-access", handlers = ["http-access"],       level = "info"},
]

Advantages:

  • Looks quite pretty
  • Very little vertical space is used

Disadvantages:

  • Quite long lines because of the added white space.
  • Changing the length of a value requires a some effort. You have to change the spacing in that line or in the other lines appart from changing the value itself.
  • Line based diffs don't show easily what value was changed.
  • Changing one value can even show other lines as changed in diffs because the spacing had to be chanegd.

With inline tables without newlines with reordered keys

[main_app.general_settings.logging]
log-lib = "logrus"

handlers = [
    {name = "default",     level = "info",  output = "stdout"},
    {name = "stderr",      level = "error", output = "stderr"},
    {name = "http-access", level = "info",  output = "/var/log/access.log"},
]
loggers = [
    {name = "default",     level = "warning", handlers = ["default", "stderr"]}, 
    {name = "http-access", level = "info",    handlers = ["http-access"]},
]

Advantages:

  • Lines are less long than without reordering
  • Looks quite pretty

Disadvantages:

  • Still quite long lines.
  • You have to reorder the keys, possibly having to choose between a logical order and order in which the whitespace is minimised.
  • Changing the length of a value requires a some effort. You have to change the spacing in that line or in the other lines appart from changing the value itself.
  • Line based diffs don't show easily what value changed.
  • Changing one value can even show other lines as changed in diffs because the spacing had to be chanegd.

With newlines

[main_app.general_settings.logging]
log-lib = "logrus"

handlers = [
    {
        name = "default",
        output =  "stdout",
        level = "info",
    }, {
        name = "stderr",
        output =  "stderr",
        level = "error",
    }, {
        name = "http-access",
        output =  "/var/log/access.log",
        level = "info",
    },
]
loggers = [
    {
        name = "default",
        handlers = ["default", "stderr"]
        level = "warning",
    }, {
        name = "http-access",
        handlers = ["http-access"]
        level = "info",
    },
]

Advantages:

  • Diffs are extremely clear, a line changed means that the value changed.
  • Short lines

Disadvatages:

  • Needs to indent twice
  • Quite a bit of vertical space is used.

Conclusion

I think ultimately it's a matter of taste what looks better. And a matter of tradeoffs between, repeated keys, vertical space, line length, diff clarity and logical vs visually pleasant key ordering. I think my main point with this example is that it would be nice if users could choose what they find more important.

I'm all for this. Just started to look into TOML properly for the first time as I was planning on using it for the configuration file for a tool I'm writing. I really like TOML overall, but this one thing makes some specific things really nasty. The bit I'm working on is actually sort of like the Docker Compose syntax in some ways.

Take this YAML for example:

version: "3"

services:
    elasticsearch:
        container_name: metrics_elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:5.5.3
        network_mode: host
        environment:
          discovery.type: single-node
          http.cors.enabled: true
          http.cors.allow-origin: "*"
          xpack.security.enabled: false
        ports:
        - 9200:9200
        - 9300:9300
        volumes:
        - elasticsearch-data:/usr/share/elasticsearch/data

    kibana:
        container_name: metrics_kibana
        image: docker.elastic.co/kibana/kibana:5.5.3
        network_mode: host
        environment:
          ELASTICSEARCH_URL: http://localhost:9200
          XPACK_MONITORING_ENABLE: false
        ports:
        - 5601:5601

volumes:
    elasticsearch-data:
        driver: local

And then compare it to the equivalen TOML:

version = "3"

[services]

  [services.elasticsearch]
  container_name = "metrics_elasticsearch"
  image = "docker.elastic.co/elasticsearch/elasticsearch:5.5.3"
  network_mode = "host"
  ports = [
    "9200:9200",
    "9300:9300"
  ]
  volumes = [
    "elasticsearch-data:/usr/share/elasticsearch/data"
  ]

    [services.elasticsearch.environment]
    "discovery.type" = "single-node"
    "http.cors.enabled" = true
    "http.cors.allow-origin" = "*"
    "xpack.security.enabled" = false

  [services.kibana]
  container_name = "metrics_kibana"
  image = "docker.elastic.co/kibana/kibana:5.5.3"
  network_mode = "host"
  ports = [
    "5601:5601"
  ]

    [services.kibana.environment]
    ELASTICSEARCH_URL = "http://localhost:9200"
    XPACK_MONITORING_ENABLE = false

[volumes]

  [volumes.elasticsearch-data]
  driver = "local"

That extra level of nesting just makes TOML that much less nice to use in this case. If the environment could be on the same level as the rest of the service configuration it'd tidy it right up.

version = "3"

[services]

  [services.elasticsearch]
  container_name = "metrics_elasticsearch"
  image = "docker.elastic.co/elasticsearch/elasticsearch:5.5.3"
  network_mode = "host"
  environment = {
    "discovery.type" = "single-node",
    "http.cors.enabled" = true,
    "http.cors.allow-origin" = "*",
    "xpack.security.enabled" = false,
  }
  ports = [
    "9200:9200",
    "9300:9300"
  ]
  volumes = [
    "elasticsearch-data:/usr/share/elasticsearch/data"
  ]

  [services.kibana]
  container_name = "metrics_kibana"
  image = "docker.elastic.co/kibana/kibana:5.5.3"
  network_mode = "host"
  environment = {
    ELASTICSEARCH_URL = "http://localhost:9200",
    XPACK_MONITORING_ENABLE = false,
  }
  ports = [
    "5601:5601"
  ]

[volumes]

  [volumes.elasticsearch-data]
  driver = "local"

At the heart of your issue, you have a large subtable that you wish to keep in the middle of your configurations. Not before it, and not after it. Some relief exists with inline tables and key-path assignments. But with a table nested a few layers deep, the keys would grow very long.

I still find multiline tables that look like JSON offputting. But I think I have an idea for a TOML-friendly syntax that could get you what you're wanting. I don't have time to write it down now, but I'll be back later on.

Hi @JelteF!

Thanks for filing this issue. I'm deferring any new syntax proposal as I try to ramp up my effort to get us to TOML 1.0, which will not contain any new syntax changes from TOML 0.5.

This is definitely an idea I want to explore more -- personally, I still haven't finalized how much TOML should be flat (INI-like) vs nested (JSON-like). Both approaches have their trade-offs and we'll know what we want to do for this specific request, once we finalize that overarching idea. However, I'd appreciate if we hold off that discussion until TOML 1.0 is released.

The earlier example:

[services]
  [services.kibana]
  container_name = "metrics_kibana"
  image = "docker.elastic.co/kibana/kibana:5.5.3"
  network_mode = "host"
    [services.kibana.environment]
    ELASTICSEARCH_URL = "http://localhost:9200"
    XPACK_MONITORING_ENABLE = false

Could instead be:

[services]
  [.kibana]
  container_name = "metrics_kibana"
  image = "docker.elastic.co/kibana/kibana:5.5.3"
  network_mode = "host"
    [.environment]
    ELASTICSEARCH_URL = "http://localhost:9200"
    XPACK_MONITORING_ENABLE = false

The example just takes advantage of the dotted keys notation, in that if the key starts with a dot, it would inherit the parent table keyspace. I went with a dot as it is has related meaning for a relative path as well ./, although another symbol may stand out better?(or could instead be prepended to the table syntax, .[environment])

The above deals with the issue of table keys getting progressively longer, where the actual table unique name gets offset to the right(potentially requiring scrolling) and/or lost in the noise of similar table keys as shown earlier in the thread.

Personally, for nested config the table keys or dotted keys can get quite long/repetitive. It's one area that I think JSON and YAML handle better.

I still find multiline tables that look like JSON offputting. But I think I have an idea for a TOML-friendly syntax that could get you what you're wanting. I don't have time to write it down now, but I'll be back later on.

@eksortso I take it later on never came, or did you raise it in another issue? What do you think about the above?

I did find it odd that inline tables have this special syntax for single lines, unable to break to multi-line with trailing commas like arrays can. Most new comers to TOML will be familiar with a table/object being defined this way and it'd click, until they realize it breaks should you want to go to multiple lines, yet arrays don't share this restriction.

I personally prefer curly brackets for additional clarification of scope. TOML appears to rely on name-spacing allowing for a flat format should you pay attention to the keys. Some try to indicate the scope a bit more via the optional indentation as shown earlier but that uncomfortable/detached to me.

I like that end of lines don't need commas in TOML, although they're required for arrays(and inline tables), they could be dropped/optional for multi-line variants?:

[services]
  [.kibana]
  container_name = "metrics_kibana"
  image = "docker.elastic.co/kibana/kibana:5.5.3"
  network_mode = "host"
  environment = {
    ELASTICSEARCH_URL = "http://localhost:9200"
    XPACK_MONITORING_ENABLE = false
  }
  ports = [
    "5601:5601"
  ]

  [.kibana_2]
  container_name = "metrics_kibana2"
  image = "docker.elastic.co/kibana/kibana:5.5.3"
  network_mode = "host"
  environment = {
    ELASTICSEARCH_URL = "http://localhost:9201"
    XPACK_MONITORING_ENABLE = false
  }
  ports = [
    "5602:5601"
  ]

This example from the project readme is a good case of verbosity/noise that gave me a double take of trying to make sense of what was going on:

[[fruit]]
  name = "apple"

  [fruit.physical]  # subtable
    color = "red"
    shape = "round"

  [[fruit.variety]]  # nested array of tables
    name = "red delicious"

  [[fruit.variety]]
    name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
    name = "plantain"

This is probably not much better, and might be asking for too much?(strays too far from what TOML currently is?):

[[fruit]]
name = "apple"
physical { # Scoped table
  color = "red"
  shape = "round"
}
variety [ # Scoped array of tables
  name = "red delicious"
  --- # A separator between objects
  name = "granny smith"
]

[[fruit]] # Still useful as a `---` above may not be distinct enough
name = "banana"
variety [
  name = "plantain"
]

Applied to the earlier example for arrays of tables:

[main_app.general_settings.logging]
log-lib = "logrus"

handlers [
    name = "default"
    output =  "stdout"
    level = "info"
    ---
    name = "stderr"
    output =  "stderr"
    level = "error"
    ---
    name = "http-access"
    output =  "/var/log/access.log"
    level = "info"
]
loggers [
    name = "default"
    handlers = ["default", "stderr"]
    level = "warning"
    ---
    name = "http-access"
    handlers = ["http-access"]
    level = "info"
]

The use of --- as a separator between elements allows for avoiding unnecessary{ }(which are useful for a single instance assigned to a key), as those add noise along with , that @eksortso I believe found offputting?

Note the lack of assignment =, that would probably lead to some mishaps with array elements as you'd need ,(instead of inferring from \n) on single lines and objects/tables would need to be wrapped with { }..

@eksortso I take it later on never came, or did you raise it in another issue? What do you think about the above?

Ouch...

Later on came and went. See #525 for discussion, and #551 for the now-closed PR.

I'll be back in a few hours.

Ouch...

Oh, I didn't mean it that way! 😝

Later on came and went. See #525 for discussion, and #551 for the now-closed PR.

Ah, that's unfortunate.. 😞 I liked the multi-line approach you proposed, substituting commas with new lines. HJSON ended up offering a good enough solution for me offering this feature in the meantime.

Ouch...

Oh, I didn't mean it that way! 😝

No worries. But there is a link to #525 up there.

Later on came and went. See #525 for discussion, and #551 for the now-closed PR.

Ah, that's unfortunate.. 😞 I liked the multi-line approach you proposed, substituting commas with new lines. HJSON ended up offering a good enough solution for me offering this feature in the meantime.

Thanks! That's good how HJSON implemented it. I've see similar patterns in other config formats, whose names I've forgotten.

But keep in mind that HJSON is based on JSON, and TOML was originally inspired by informal INI formats. What that means, philosophically, is that nesting in TOML is possible, but deep nesting is, and ought to be, discouraged. By that philosophy, shallow nesting is ideal for a configuration format, and it also works for simple data exchange uses. Over time, I've come to adopt this philosophy myself. I'm still interested in bringing back a little bit of nesting, a la #551, but unless it gains traction, I won't push for it.

Other proposals have been offered to use [.subtable] syntax for nesting. But it can get confusing if you can't keep track of your absolute path. In fact, your first example suggests that each [.subtable] nests inside its parent, but your second example suggests that each [.subtable] is a subtable of a common parent. Is [.kibana_2] actually [services.kibana_2], or [services.kibana.kibana_2]?

But there's another problem that [.subtable] syntax doesn't solve, which was my impetus for #525: it can't be used to put subtable definitions in the middle of other tables. That was relieved with the introduction of dotted keys in key/value pairs. Again, it works best for shallow nesting.

Regarding commas in arrays and in inline tables, I do feel like the rules for placing those commas ought to be strict, to prevent confusion. It's already decided that arrays require commas between elements, and that a trailing comma is fine. For inline tables, commas must separate the key/value pairs, since they're on the same line. If #551 were reintroduced, newlines could be used to separate the key/value pairs in multi-line inline tables, same as they are used for regular tables. But commas would not be allowed between lines.

I'm intrigued by some of your other proposals, particularly the --- separator in arrays of tables (which could be used with regular table-array notation, actually). But I'll suggest that you simplify their presentations, and open a new issue for each to present them. Long posts like these are often hard to follow, so less really would be more.

Perhaps worth connecting this proposal with #744 as well, for use of placeholders/shortcuts to outer tables names.

Example:

[servers]
mask = "255.255.255.0"

[*.server1] # subtable 2
ip = "192.168.0.1"

[*.server2] # subtable 3
ip = "192.168.0.2"

The above is the same as explicit/verbose keys servers.server and servers.server2 in tables 2 and 3.

I agree to support line break.


Yes, there exists a form that makes the final result look good and easy to read.

But the problem is that the conversion tools and serde tools can’t do it.

The conversion tool can only convert a long line of things that cannot be read.

If line breaks are allowed, these tools can adjust the indentation to make the results look better.

For me as a user, the fact that newlines aren't allowed inside inline tables was extremely surprising.

For me as an implemented, that's a special case in the parser that I wish I could get rid of.

I'm all for allowing it.

For me as a user, the fact that newlines aren't allowed inside inline tables was extremely surprising.

I could say in response that the mere existence of inline tables is surprising, because the INI tradition only allows values on a single line, and only then it's just one key/value pair per line. Multiple lines are the exception, not the rule. And there are two other, more versatile, ways to define a table over multiple lines.

Probably ought to go to #781 and join the discussion there.

@eksortso It's a surprise within the TOML specification. If you see a file with an array with line breaks, it's quite reasonable to assume that all composite values can have line breaks in them, except it's not the case. Same goes for trailing commas.

If you see a file with an array with line breaks, it's quite reasonable to assume that all composite values can have line breaks in them

This is a good point. I'd support this change for that reason alone; it's a very weird inconsistency in the language.

Everyone has forgotten that inline tables were intended to allow brief, terse injections of small tables into a configuration. They were never intended to replace table headers and sections, and they were never intended to extend beyond a single line.

How consistent must we be? Consistent enough to nullify all intentional design choices? This is still a bad idea.

I mean, it wouldn't be hard to implement. Our work is halfway done for us already, because we can reuse the ABNF code for splitting arrays across multiple lines. This would also let us include end-of-line comments. More consistent all around.

servers = {
    alpha = {  # primary server
        ip = "10.0.0.1",
        role = "frontend",},
    beta = {  # secondary server
        ip = "10.0.0.2",
        role = "backend",},
}

And while we're at it, let's allow commas between key/value pairs outside of inline tables, so we can have more than one key/value pair on a single line. This is also a bad idea, but it's consistency, and that's what we want.

[owner]
name = "Tom Preston-Werner", dob = 1979-05-27T07:32:00-08:00,

Other benefits may come from this. If all headers were replaced with inline tables, then we could define top-level key/value pairs at the bottom of the document, or in the middle, because why not?

servers = {
    alpha = {ip = "10.0.0.1", role = "frontend",},  # primary server
    beta = {ip = "10.0.0.2", role = "backend",},  # secondary server
},

title = "TOML Example",

# This was a TOML document

Consistency over design, consistency over functionality, consistency over readability, consistency over everything else. Where does it end? When TOML becomes a superset of JSON?

Never mind the bitterness. Tell me what you think of these different ideas. Maybe you can put my fears to rest.

But if we're going to smash this piñata to bits, let's stuff it with some more sweet treats. Once again, I propose we allow newlines to separate key/value pairs as well as commas, just like we can do outside of inline tables. That will make things even more consistent. And we can still have a comma before or after the newline if we wanted.

[database]
enabled = true
ports = [ 8000, 8001, 8002 ]
data = [ ["delta", "phi"], [3.14] ]
temp_targets = {
    cpu = 79.5
    case = 72.0
}

Everyone has forgotten that inline tables were intended to allow brief, terse injections of small tables into a configuration.

I don't think anyone has forgotten, they just disagree with the intentions behind the design. The design is not sacrosanct and it should not be treated as such.

How consistent must we be? Consistent enough to nullify all intentional design choices? This is still a bad idea.

This entire bug is debating over a specific intentional design decision, and the answer seems to be "at least a tiny bit more consistency than we have now."

If TOML's primary goal was to make pretty configs the current design already does poorly when tasked with common config structures. Those examples are at least concise and consistent, even if they're intentionally ugly. TOML doesn't currently force end users to write good looking configs, and if it did it would have to be with parsers rejecting configs that don't follow some strictly mandated style.

then we could define top-level key/value pairs at the bottom of the document, or in the middle, because why not?

The inability to back out of a regular table to the global scope is also a surprising pain point that has come up repeatedly, dictating the order of configuration options to applications. Just because a key/value is top level doesn't mean it's important, it can be much less important than the tables that would appear before it in other languages.

I think the primary reason #551 failed to garner interest was because it would result in unexpected and surprising parsing errors for end users (as opposed to developers writing parsers). At least that was my problem with it. They will not realize or appreciate that there are two types of tables using {} and that each table must entirely conform to one style. The current design also surprises and confuses end users, as evidenced by this very bug.

@eksortso

Consistency over design, consistency over functionality, consistency over readability, consistency over everything else.

Literally nobody is saying that, but you know that. The inconsistency is dumb in this one particular context because it already causes regular, significant confusion for users.

I'd also argue that "consistency over design" is conceptually nonsensical; good design is always internally consistent. TOML has two collection value types (as in, two ways of specifying a collection on the right hand side of a KVP assignment); both are comma-delimited, but only one allows newlines. This is internally inconsistent, enough that users are regularly caught out by it.

To clarify my point:

There are three syntax classes now: top level tables (separator = newline), array (separator = comma, whitespace doesn't matter), and inline table (separator = comma, newlines not allowed). The proposal is to cut it down to two syntax classes: top-level tables and composite values (arrays and inline tables).

Our work is halfway done for us already, because we can reuse the ABNF code for splitting arrays across multiple lines.

The ABNF is hardly much more than a starting point, any real parser needs as much (if not more) code to reject invalid documents that the ABNF accepts.

There are three syntax classes now: top level tables (separator = newline), array (separator = comma, whitespace doesn't matter), and inline table (separator = comma, newlines not allowed). The proposal is to cut it down to two syntax classes: top-level tables and composite values (arrays and inline tables).

Thank you @dmbaturin, that answers my question about consistency. It still doesn't assuage my fears about forcing JSON-like patterns into TOML. But if we keep commas in the inline tables, maybe users will see the difference and won't make inelegantly designed configuration templates. They certainly could do that if we allowed it.

Our work is halfway done for us already, because we can reuse the ABNF code for splitting arrays across multiple lines.

The ABNF is hardly much more than a starting point, any real parser needs as much (if not more) code to reject invalid documents that the ABNF accepts.

I'm talking about what it would take the modify the specification. That's what this project is, after all. Our work modifying the ABNF is halfway done. There would still need to be changes to toml.md. There would need to be tests written and changed as well, and we ought to keep tabs on that effort. But parser implementations are completely separate.

There would still need to be changes to toml.md. There would need to be tests written and changed as well

I'm ready to help with that.

It still doesn't assuage my fears about forcing JSON-like patterns into TOML

A good language should facilitate good patterns (and TOML does), but no language can prevent bad ones. For example, I consider the following valid TOML ugly, but I'm not advocating to make it illegal—that would require without sacrificing the entire dotted key syntax, which is a convenient shortcut that can greatly improve readability if used appropriately.

foo.bar.baz = 1
foo.bar.quux = 1
foo.bar.xyzzy = 2

At least we both likely agree that it's less readable than the intended way to write it:

[foo.bar]
  baz = 1
  quux = 1
  xyzzy = 2

Likewise, an array with tiny items and a newline after each one is pretty bad idea, but that doesn't warrant disallowing them.

foo = [1,
2,
3]

If people have enough aesthetic judgement not to do this, I don't think they will start converting their normal tables to inline tables en masse once they see the news of newline support.

commented

This approach could help simplify the confusion of writing TOML like this (a slightly simplified version of a real world example):

[tool.pydoc-markdown.renderer]
type = "mkdocs"

[tool.pydoc-markdown.renderer.mkdocs_config]
site_name = "HDX Python Scraper"

[[tool.pydoc-markdown.renderer.pages]]
title = "Home"

[[tool.pydoc-markdown.renderer.pages]]
title = "API Documentation"

[[tool.pydoc-markdown.renderer.pages.children]]
title = "Source Readers"
contents = ["hdx.scraper.readers.*"]

[[tool.pydoc-markdown.renderer.pages.children]]
title = "Outputs"
contents = ["hdx.scraper.jsonoutput.*", "hdx.scraper.googlesheets.*", "hdx.scraper.exceloutput.*"]

This looks much simpler in YAML:

tool:
  pydoc-markdown:
    renderer:
      type: mkdocs
      mkdocs_config:
        site_name: HDX Python Scraper
      pages:
        - title: Home
        - title: API Documentation
          children:
            - title: Source Readers
              contents:
                - hdx.scraper.readers.*
            - title: Outputs
              contents:
                - hdx.scraper.jsonoutput.*
                - hdx.scraper.googlesheets.*
                - hdx.scraper.exceloutput.*

With the proposed new syntax, would it look something like this?

[tool.pydoc-markdown.renderer]
type = "mkdocs"
mkdocs_config.site_name = "HDX Python Scraper"
pages = [
    {
        title = "Home"
    },
    {
        title = "API Documentation",
        children = [
            {
                title = "Source Readers",
                contents = ["hdx.scraper.readers.*"] 
            },
            {
                title = "Outputs",
                contents = ["hdx.scraper.jsonoutput.*", "hdx.scraper.googlesheets.*", "hdx.scraper.exceloutput.*"]
            },
        ]
    }
]

If so, then I think that would be a big improvement, capturing some of the simplicity and readability of the YAML version.

Please accept this proposal and start a v2 format of toml.

The reason I quit json to use toml is that json do not allow add a comma after last key/value pair in object.
I has tried toml and it do not allow too!!!
toml allow:

NormalVlanConfigList=[{PortId=2,EndPortId=14,VlanId=100},{PortId=15,EndPortId=28,VlanId=101},]

but toml do not allow:

NormalVlanConfigList=[{PortId=2,EndPortId=14,VlanId=100,},{PortId=15,EndPortId=28,VlanId=101,}]

It is not a bug of the library (github.com/pelletier/go-toml) , it is defined as it:
https://toml.io/en/v1.0.0#inline-table

A terminating comma (also called trailing comma) is not permitted after the last key/value pair in an inline table.

WTF?

And also some nest struct problem... like. #781 (comment)

So I want to try another one ,like hjson or write a new one myself.

commented

"merely using newline as a separator would make it so that you could just take an existing table definition and wrap that with {} to get a layer of nesting" (from the other issue)

@pradyunsg would that look like the Json-like example from the other issue but with the commas removed wherever there are newlines including inside [] and {} ?

Alright, so there's three open questions here:

  • Would it make sense to allow trailing commas on single line inline tables?
  • Would it make sense to enforce to either specify-everything-on-one-line or specify-everything-on-separate-lines?
  • What should the multiline {...} table syntax be consistent with: the existing inline table syntax + array (comma as a separator) or the existing regular table syntax (newlines as a separator)?

@pradyunsg would that look like the Json-like example from the other issue but with the commas removed wherever there are newlines including inside [] and {} ?

The other issue is #781. Merely stating "other issue" isn't clear enough for people who aren't subscribed to all the issues on this repository. Also, JSON. :)

And, as that is phrased currently, no -- that wouldn't involve removing the commas inside the [] but it would involve removing the commas inside the {}. :)

@pradyunsg I would rather look at it from the opposite point of view: are there compelling reasons not to allow trailing commas or enforce newlines? I don't think there are.

My answers at the moment, although I'm certainly open to being convinced either way on all of these...

Would it make sense to allow trailing commas on single line inline tables?

Yes, if we end up adding multiline tables that use a comma. If we use commas for multiline tables, it would definitely make sense there and would be a reasonable strong case to add it in here.

Maybe otherwise? Like, there's no real functional advantage unless you're generating TOML, IMO -- if a human is editing the file, they're unlikely to add a trailing comma and it's not going to make reviewing diffs easier like it would for multiline files.

Would it make sense to enforce to either specify-everything-on-one-line or specify-everything-on-separate-lines?

Yes, since that forces the same level of one-thing-at-a-time style visual separation that the existing [table] syntax forces.

Plus, it's easier to relax something like this in the future, than to become stricter. :)

What should the multiline {...} table syntax be consistent with: the existing inline table syntax + array (comma as a separator) or the existing regular table syntax (newlines as a separator)?

There's three options here:

  • newlines (i.e. be consistent with regular tables)
    • This makes it trivial to nest a table, which... I'm not sure what the value proposition is here.
    • This enforces that such tables are multiline, since newline is the separator.
  • comma (i.e. be consistent with inline tables and arrays)
    • It's not clear what the whitespace strategy would be and it's very possible for this to be used in a manner that disincentivises a better post-parse data structure.
  • comma-and-newline
    • Visually consistent with both styles.
    • Most "strict" in some senses, because it requires adhering to both constraints (multi-line + commas-as-separator).
    • This enforces that such tables are multiline, since newline is a part of the separator.
    • Requires either disallowing trailing whitespace, or dealing with traling whitespace in the parser.

I like the last one the most; but it also feels... mildly annoying to write. 🤷🏽‍♂️

are there compelling reasons not to allow trailing commas or enforce newlines?

That's a good way of thinking of this as well.

commented

@pradyunsg It would be good to see examples of the different options eg. using the real world example from here : #516 (comment)

I'm gonna step away to do other things right now.

Please feel free to write them up yourself and post them here! There's no reason I have to do all the typing here. :)

commented

@pradyunsg OK I'll do so tomorrow morning (NZ time)

As much as I like consistency I'm really tempted to prefer newlines without commas to make it less annoying to write. Reading should be fine.

Personally I'd choose commas-only as the delimiter, consistent with arrays. If we force newlines after every single KVP it's just going to annoy/confuse since it's not what people will expect. It's already enshrined in the language that { } denotes a table with different syntactical semantics, so I think strictness concerns don't achieve much - if a user has a lot of very short KVP's and wants to lay them out in a Nx4 grid or something, it seems the opposite of "Obvious" to disallow that.

RE trailing commas: they should be allowed everywhere in every language that uses them as a delimiter, frankly. I'd go so far as to suggest that not allowing them is openly user-hostile; it's nice to be able to comment out a particular line in an array/table/dictionary/map/whatever and not need to also do an error-prone dance with commas. This has never been a concern with TOML inline tables since they didn't allow newlines, but if they are extended to allow newlines, then allowing trailing commas is an effectively mandatory side-effect IMO.

I think I'd be in favour of the "comma-only" separator option. This would, at least for me, keep the mental model of TOML syntax the simplest. There would be no "single line inline table" and "multiline inline table", just inline tables that keep the same rules whether they span one or more lines. This also brings inline table syntax in line with arrays.

Having the comma present clearly distinguishes the syntax from top level key-value pairs, which makes it clear that things like [table] headers are not allowed. I think this might be good.

commented

@pradyunsg I've had a crack at representing the 3 forms you mentioned. Please note I am a relative newbie in TOML so if I have misunderstood, please correct me:

1. Newlines:

[tool.pydoc-markdown.renderer]
type = "mkdocs"
mkdocs_config = {
  site_name = "HDX Python Scraper"
}

pages = [
  { title = "Home" },
  {
    title = "API Documentation"
    children = [
      {
        title = "Source Readers"
        contents = [
          "hdx.scraper.readers.*",
        ]
      },
      {
        title = "Outputs"
        contents = [
          "hdx.scraper.jsonoutput.*",
          "hdx.scraper.googlesheets.*",
          "hdx.scraper.exceloutput.*",
        ]
      },
    ]
  },
]

I think the same rule would need to apply to [] as for {} (ie. remove commas within [] above) otherwise it is confusing as you can see above.

2. Comma:

[tool.pydoc-markdown.renderer]
type = "mkdocs"
mkdocs_config = { site_name = "HDX Python Scraper" }

pages = [
  { title = "Home" },
  { title = "API Documentation", children = [
    { title = "Source Readers", contents = [
      "hdx.scraper.readers.*"
    ] },
    { title = "Outputs", contents = [
      "hdx.scraper.jsonoutput.*",
      "hdx.scraper.googlesheets.*",
      "hdx.scraper.exceloutput.*"
    ] },
  ] },
]

I have assumed that comma is the current approach you had documented here under "Already possible today".

3. Comma and newline:

[tool.pydoc-markdown.renderer]
type = "mkdocs"
mkdocs_config = {
  site_name = "HDX Python Scraper"
}

pages = [
  { title = "Home" },
  {
    title = "API Documentation",
    children = [
      {
        title = "Source Readers",
        contents = [
          "hdx.scraper.readers.*",
        ]
      },
      {
        title = "Outputs",
        contents = [
          "hdx.scraper.jsonoutput.*",
          "hdx.scraper.googlesheets.*",
          "hdx.scraper.exceloutput.*",
        ]
      },
    ]
  },
]

Fair points @marzer and @hukkin! Just commas it is then.

Then this is just a matter of changing the ws, comment and comma handling for inline tables to be consistent with arrays. I'll file a PR for this when I get the time to do so, likely sometime this month. :)

commented

@pradyunsg I am delighted to hear that this is moving forward. Taking the wording for arrays from the spec and using for inline tables, the spec for inline tables will be: "Inline tables can span multiple lines. A terminating comma (also called a trailing comma) is permitted after the last value of the inline tables. Any number of newlines and comments may precede values, commas, and the closing bracket. Indentation between inline table values and commas is treated as whitespace and ignored."

My understanding then is that both the representations below will be valid. Please correct me if I am wrong.

[tool.pydoc-markdown.renderer]
type = "mkdocs"
mkdocs_config = {
  site_name = "HDX Python Scraper"
}

pages = [
  { title = "Home"},
  {
    title = "API Documentation",
    children = [
      {
        title = "Source Readers",
        contents = [
          "hdx.scraper.readers.*"
        ]
      },
      {
        title = "Outputs",
        contents = [
          "hdx.scraper.jsonoutput.*",
          "hdx.scraper.googlesheets.*",
          "hdx.scraper.exceloutput.*"
        ]
      }
    ]
  }
]
[tool.pydoc-markdown.renderer]
type = "mkdocs"
mkdocs_config = {
  site_name = "HDX Python Scraper",
}

pages = [
  { title = "Home", },
  {
    title = "API Documentation",
    children = [
      {
        title = "Source Readers",
        contents = [
          "hdx.scraper.readers.*",
        ]
      },
      {
        title = "Outputs",
        contents = [
          "hdx.scraper.jsonoutput.*",
          "hdx.scraper.googlesheets.*",
          "hdx.scraper.exceloutput.*",
        ],
      },
    ],
  },
]

I just came across this in my project (my first using TOML) and I think my example illustrates why the current solutions just don't "feel" clean, even though they aren't really problematic per se.

The starting point in my project was:

# Form 1
[layer.base]
name = 'Base Layer'
buttons = [
	'open-test-layer',
	'',
	'',
	'',
	'reset',
	'exit'
]

However, the buttons array is sparse and I didn't want to need to include empty keys. This is what I tried next, which seemed like a logical way to move from an array to a dict, and looks clean, but isn't currently allowed:

# Form 2
[layer.base]
name = 'Base Layer'
buttons = {
    1 = 'open-test-layer'
    5 = 'reset'
    6 = 'exit'
}

The next version of course works, but with more than 1 or 2 buttons this would begin to completely fall flat in terms of readability:

# Form 3
[layer.base]
name = 'Base Layer'
buttons = { 1 = 'open-test-layer', 5 = 'reset', 6 = 'exit' }

And finally, what I've settled on (for now) as the best available option:

# Form 4 - okay
[layer.base]
name = 'Base Layer'
[layer.base.buttons]
1 = 'open-test-layer'
5 = 'reset'
6 = 'exit'

This is not bad, but I really don't like the duplication in the buttons array. For longer keys (and with multiple sub-tables) this could get quite tiring.

I think my issues come down to two things:

  1. The need to re-specify the parent key for all sub-tables - I know I've seen some proposals that would allow this to be replaced with something like [.buttons] which would be quite nice. I also know there's some pushback saying that it makes it harder to read, but I disagree and I think almost any feature can be misused. :)
  2. Arrays (see Form 1) can span multiple lines, but dicts/tables cannot (Form 2). In almost all programming languages definitions for both arrays and dicts can span lines and I think this disconnect is why it feels like multi-line tables are missing, even though there's technically already an alternative.

And to add one more thing: the ending commas on arrays really seem like they could be optional - I don't believe it would introduce any ambiguity by not requiring them, but maybe others have some more well-researched thoughts on this.

Regardless, I'm really liking TOML. It's a breath of fresh air after the feature-creep abomination that YAML has turned into. 🤣

@jstm88 Using only the existing syntax, your Form 4 can also be nicely written using dotted keys:

[layer.base]
name = 'Base Layer'
buttons.1 = 'open-test-layer'
buttons.5 = 'reset'
buttons.6 = 'exit'

(It would look even better if the subtable were named "button" instead of "buttons".)

i created a few proposals for how to do nesting in toml json like
i prefer #898
but also #900

I have an idea for a single universal separation format. It incorporates the idea of newlines and trailing commas in inline tables, and much more. In a sense, it's a bound on the other extreme of this debate. Take a look at #903.

I touched a bit on this in #903 (comment), but my main concern with this is generating quality error messages.

For example:

tbl = {
    a = 1,
    b = 2,
    c = 3
k  = 4
k2 = 5
tbl = {
    a = 1,
    b = 2,
    c = 3,
k  = 4
k2 = 5

Assuming we allow both newlines and trailing commas, in the first example we can generate a good error message: after c = 3 there is no comma and now we see another key/value pair, so we can display:

Error: missing , or } after 3 in:

    c = 3
         ^

The second example is trickier; we left off the } but where do we intend the table to end? This is ambiguous; the error message here will be:

Error: missing , or } after 4 in:

    k  = 4
          ^

Which is still okay-ish, I guess, but not great either.


The difficulty here is that key = value is used both in inline tables and top-level k/v pairs. I like this feature, but it can make things a bit trickier as the same syntax is used in two different contexts.

None of this is a show-stopper as far as I'm concerned, but I'm a huge fan of accurate error messages that say "here exactly is your error", rather than "here is where I encountered a parsing error, but your actual error is a few lines up". Currently, TOML allows almost entirely the first type of errors.

@pradyunsg A while back, you observed that this is "just a matter of changing the ws, comment and comma handling for inline tables to be consistent with arrays," and you would file a PR. I'd like to expedite this. Do you have a PR started? Would you mind if I took a crack at it?

I'm leaving #903 open for further discussion, but it's becoming apparent that this change needs to be made. We'll retain the need for commas as separators inside inline tables even if those tables span multiple lines, and we will allow a trailing comma. From the perspective of #903, this change could be seen as a precursor. But it's necessary now.