stjude-rust-labs / wdl

Rust crates for working with Workflow Description Language (WDL) documents.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve diagnostic for incorrectly specified struct literals in the new parser

peterhuene opened this issue · comments

With gauntlet switched over to the new parser, it records this diagnostic:

[[diagnostics]]
document = "aws-samples/amazon-omics-tutorials:/example-workflows/gatk-best-practices/workflows/somatic-snps-and-indels/mutec2.wdl"
message = "mutec2.wdl:112:40: error: expected input section, output section, runtime section, metadata section, parameter metadata section, conditional statement, scatter statement, task call statement, or private declaration, but found `{`"

The source in question is:

Runtime standard_runtime = Runtime {"gatk_docker": gatk_docker,
                                        "cpu": small_task_cpu,
                                        "machine_mem": small_task_mem * 1024,
                                        "command_mem": (small_task_mem * 1024) - 512}

Unfortunately, that's not legal syntax for a struct literal as the member names cannot be quoted.

What the new parser implementation is doing is attempting to recover after what is a valid (syntatically-speaking) expression:

Runtime standard_runtime = Runtime

This is a private declaration with type Runtime (a type name reference) with name standard_runtime with expression Runtime (a name reference). If we were to evaluate that, it would be an error (i.e. Runtime names a type not a value).

So the next thing to parse is {, but the parser expects a workflow item, hence the confusing error.

I suggest we improve this during the disambiguation the takes place between a struct literal and a name reference.

When we peek3 and we match { -> ? -> : where ? signifies something other than an identifier, we add a single diagnostic to report that a struct literal must use identifiers for member names and recover at the next }.

So unfortunately it would require that we look ahead as many tokens as required to get past the string.

That means if the second lookahead token is an open single/double quote, we would have to switch into an interpolation context to move the lexer past the string to find the : token. That's not ideal.

And why is the colon token important? Because these are both syntactically valid:

# A conditional statement comparing literal structs in its expression
if Foo { bar: 1 } == Foo { bar: 1 } {
  # ...
}

# A conditional statement with a name reference expression (`Foo`) that has a bound decl.
if Foo { 
  bar? x = None
}

Without seeing the :, we can't say that the first Foo is a name reference or the start of a struct literal expression.

However, I don't think encountering a quote token following identifier -> { would be a legal in any context as the only place this occurs is in workflow conditional statements and a string cannot start a workflow statement (only call, scatter, if, an identifier, or type name keyword do).

We could just treat it as an invalid struct literal if we encounter a quote token following the {, but this other invalid WDL would cause a confusing error complaining about a struct literal:

# This would currently fail to parse with "expected a workflow statement, but found `"`" (we should improve this).
if Foo {
  "bar"
}

The : gives it a much clearer intention that the user thought they were specifying a struct literal.