Improve diagnostic for incorrectly specified struct literals in the new parser
peterhuene opened this issue · comments
With gauntlet switched over to the new parser, it records this diagnostic:
[[diagnostics]]
document = "aws-samples/amazon-omics-tutorials:/example-workflows/gatk-best-practices/workflows/somatic-snps-and-indels/mutec2.wdl"
message = "mutec2.wdl:112:40: error: expected input section, output section, runtime section, metadata section, parameter metadata section, conditional statement, scatter statement, task call statement, or private declaration, but found `{`"
The source in question is:
Runtime standard_runtime = Runtime {"gatk_docker": gatk_docker,
"cpu": small_task_cpu,
"machine_mem": small_task_mem * 1024,
"command_mem": (small_task_mem * 1024) - 512}
Unfortunately, that's not legal syntax for a struct literal as the member names cannot be quoted.
What the new parser implementation is doing is attempting to recover after what is a valid (syntatically-speaking) expression:
Runtime standard_runtime = Runtime
This is a private declaration with type Runtime
(a type name reference) with name standard_runtime
with expression Runtime
(a name reference). If we were to evaluate that, it would be an error (i.e. Runtime
names a type not a value).
So the next thing to parse is {
, but the parser expects a workflow item, hence the confusing error.
I suggest we improve this during the disambiguation the takes place between a struct literal and a name reference.
When we peek3
and we match {
-> ? -> :
where ? signifies something other than an identifier, we add a single diagnostic to report that a struct literal must use identifiers for member names and recover at the next }
.
So unfortunately it would require that we look ahead as many tokens as required to get past the string.
That means if the second lookahead token is an open single/double quote, we would have to switch into an interpolation context to move the lexer past the string to find the :
token. That's not ideal.
And why is the colon token important? Because these are both syntactically valid:
# A conditional statement comparing literal structs in its expression
if Foo { bar: 1 } == Foo { bar: 1 } {
# ...
}
# A conditional statement with a name reference expression (`Foo`) that has a bound decl.
if Foo {
bar? x = None
}
Without seeing the :
, we can't say that the first Foo
is a name reference or the start of a struct literal expression.
However, I don't think encountering a quote token following identifier
-> {
would be a legal in any context as the only place this occurs is in workflow conditional statements and a string cannot start a workflow statement (only call
, scatter
, if
, an identifier, or type name keyword do).
We could just treat it as an invalid struct literal if we encounter a quote token following the {
, but this other invalid WDL would cause a confusing error complaining about a struct literal:
# This would currently fail to parse with "expected a workflow statement, but found `"`" (we should improve this).
if Foo {
"bar"
}
The :
gives it a much clearer intention that the user thought they were specifying a struct literal.