toml-lang / toml

Tom's Obvious, Minimal Language

Home Page:https://toml.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ambiguity about comments in abnf for arrays (toml 1.0.0)

auronandace opened this issue · comments

The abnf states the following:

;; Whitespace

ws = *wschar
wschar =  %x20  ; Space
wschar =/ %x09  ; Horizontal tab

;; Newline

newline =  %x0A     ; LF
newline =/ %x0D.0A  ; CRLF

;; Comment

comment-start-symbol = %x23 ; #
non-ascii = %x80-D7FF / %xE000-10FFFF
non-eol = %x09 / %x20-7F / non-ascii

comment = comment-start-symbol *non-eol

;; Array

array = array-open [ array-values ] ws-comment-newline array-close

array-open =  %x5B ; [
array-close = %x5D ; ]

array-values =  ws-comment-newline val ws-comment-newline array-sep array-values
array-values =/ ws-comment-newline val ws-comment-newline [ array-sep ]

array-sep = %x2C  ; , Comma

ws-comment-newline = *( wschar / [ comment ] newline )

If I'm reading this right that means ws-comment-newline can contain any amount of whitespace or optional comment and newline pairs.

I can see that an array-values can have comments before and after the value. Also, the array itself can also have comments before the array-close. For clarity I'll refer to these as before_val_comment, after_val_comment and array_comment.

How does one differentiate between an after_val_comment and an array_comment specifically at the end of an array?

I can see that the array-sep is optional at the end of an array. I don't see how one can be certain that the trailing comments belong to after_val_comment or array_comment without a trailing array-sep. Am I missing something obvious? Do people just assume they belong to the array_comment because they are at the end? If there are multiple comments then would they ever be split between after_val_comment and array_comment without the optional array-sep?

Is it just simply the case that most parsers just ignore whitespaces and comments when they occur in the valid places?

How does one differentiate between an after_val_comment and an array_comment specifically at the end of an array?

Is there an ambiguous parse here? Yes. Are you missing something? Sort-of.

Most parsers are either not using the ABNF directly, or the ones that do: I'd expect that they're left-recursive and consistently consume the comments as a part of after_val_comment (when there's no trailing comma).

The ambiguous parse can be eliminated by moving the final ws-comment-newline from array into the [ array-sep ] in array-values.

So without a trailing array-sep putting the comments in either is considered valid? In other words, it is up to the parser, which makes it implementation defined then?

Let me know if I should close the issue myself or wait for a maintainer to do that. I don't want to clutter up the issue tracker.

Many thanks for your responses.

In other words, it is up to the parser, which makes it implementation defined then?

Yup.