Unterminated ifs and loops execute instead of throwing syntax error
maetshju opened this issue · comments
When using control structures like if
and for
, the scripting language parser is not checking for a terminating symbol for the structure before executing the code.
For example, all of the following scripts run without raising a syntax error:
if 1 = 1
writeInfoLine: 1
for i from 1 to 10
writeInfoLine: i
while 1
writeInfoLine: 1
repeat
writeInfoLine: 1
The loops will run the interior code block once. Other programming languages I work with (like Julia, R, and Bash) would raise a syntax error here for not terminating the code block appropriately with equivalents of endif
, endfor
, endwhile
, or until
.
This feels like a bug in how the language grammar is parsed, but I'm not sure if this was perhaps intended for some reason. If this was intended in the parser, I would put forth as a feature request that these unterminated control structures throw syntax errors instead of executing.
I've been reading some of the code in sys/Interpreter.cpp, and I suspect some of the error-checking lines aren't actually being hit.
For example, this line
Line 2407 in f0ee4fa
is guarded by
Line 2385 in f0ee4fa
If the for
loop is written as
for i from 11 to 10
writeInfoLine: 1
to make sure the guard is passed, the Umatched 'for'
line is reached and the Umatched 'for'
error is thrown.
Similarly, writing a while
loop as
i = 2
while i < 2
writeInfoLine: 1
seems to hit
Line 2645 in f0ee4fa
Umatched 'while'
error.I've spent some time looking over how loops are interpreted this afternoon. The general while
interpretation strategy I have understood from Interpreter.cpp is the following (Markdown formatting for this is a bit weird):
- Encounter the keyword
while
- Evaluate the guard after
while
- 2.1. If the guard is true, continue eagerly interpreting each subsequent line.
- 2.1.1. If
endwhile
encountered, then work backward and repeat from 1.
- 2.1.1. If
- 2.2. If the guard is false, find
endwhile
.isomorphicisomorphic- 2.2.1. If
endwhile
found, move past it. - 2.2.2. If
endwhile
not found, throw unmatchedwhile
error.
- 2.2.1. If
- 2.1. If the guard is true, continue eagerly interpreting each subsequent line.
There is, of course, some loop-depth checking and other error checking (e.g., text after endwhile
), but I think I've captured the core strategy. I haven't looked at for
or repeat until
as much, but they seemed similar (and loops are isomorphic to each other). I am also not entirely sure what the calls to the Melder functions do, outside of interpretation as a general concept.
Regardless, the final syntax check for the loop only appears to occur at 2.2.2., after the entire loop has been run and terminated. Ideally, the syntax would be checked at 1. and then would not need to be checked again.
I don't see a simple change to suggest to have the syntax check occur earlier; any substantive change would require a change in the parsing strategy. Some thoughts on how the syntax check could be implemented:
- Do an overall syntax check before executing any line of code. This could be done as part of the scan that finds procedures before executing the rest of the code. In theory, this would make all Praat programs safer since nothing will be executed if there are any syntax errors.
- When encountering
while
(and analogous structures):- Check loop guard and skip over loop if
false
. (As already performed) - Scan for
endwhile
. If not found, throwUnmatched 'while'
- Otherwise, work backward and collect lines between
endwhile
andwhile
. Oncewhile
is reached, execute the collected lines. Repeat from i.
- Check loop guard and skip over loop if
As an alternative in strategy 2, the lines could be collected as part of the scan to find endwhile
, executed after endwhile
is found, and then the interpreter moves back up to the while
statement.
This is all intentional, and meanwhile many scripts will rely on it, e.g. by freely writing text after exit
. Taking a one-line-at-a-time approach used to be crucial when the role of variable substitution (i.e. self-modifying code) was stronger. This doesn't really extend to for/endfor, if/else/endif, while/endwhile and repeat/until, though, so a stricter version of the language could already be devised, apart from breaking some unusual scripts. If we don't want to break too much, we could take your version 2 (scanning for block-ends at runtime), but in a lot simpler way (the "collecting lines" is superfluous).
We would still break the following trick, for which I don't know how often people use it (can you guess what it outputs?):
a = 3
b = 0
while a < 10
a += 1
if a <= 5
b = 2*a
endwhile
endif
writeInfoLine: a, " ", b
We have been thinking about a stricter language, without variable substitution, for some time, the major goal being optimization (by compiling all lines before running the first) rather than syntax checking, but syntax checking would automatically come with it.
(the "collecting lines" is superfluous)
Yes, it is. There are manifold ways a change could be accomplished. The line collection was intended to be as small a modification to the current code as possible since the backward progression is already part of the loop. I personally think it would be preferable to not have to collect the lines, but I am not a dev for this project, so my opinion only matters so much.
(can you guess what it outputs?):
It produces the function of a break
statement, taking advantage of how evaluating the if
condition to false
will advance to the next endif
, thereby moving past endwhile
. I will say that I have often wanted to use explicit break
and continue
keywords in Praat scripts. However, I am not interested in writing code where the blocks are not entirely nested within each other to achieve that functionality.
I can only speak for myself (and slightly for some other users I know), but I would have expected that block of code to error out, rather than be an intended programming trick. I also can't think of any programming language I know or have used or studied where such block mixing is possible (even though that lack of imagination is evidence of nothing much but my own lack of exposure).
I recognize, ultimately, that there is no "good" solution to constraints of keeping old scripts runnable while also changing the execution model. I have witnessed too much of the Python 2 to 3 debacle for too long to think that would be easy. I will say, though, that I am hesitant to write non-trivial Praat scripts knowing that the syntax of control structures is not rigorously checked.
The example above cannot indeed be seriously recommended. The only common use of variable substitution in control structures is "computed goto
", as in
goto 'label$'
This works in Praat, and is used in jump tables, as when you're using the Demo window to create a PowerPoint-like presentation (with Praat capabilities, such as on-the-fly simulations). This run-time computation of where to jump should be kept alive.
For break
and continue
, both very useful, I would like to have an unconditional version and a conditional version, just as already exists for goto
. For instance, one should be able say
if numberOfSamplesLeft = 0
break
endif
but also
break numberOfSamplesLeft = 0
and
continue vowel$ = ""
or would that be clearer if written as?:
break if numberOfSamplesLeft = 0
continue if vowel$ = ""
I feel a bit more live-and-let-live about goto
statements, and I recognize the utility of variable substitution there. I'm generally disinclined from using goto
statements (perhaps I was just told too many times never to use them in computer science classes a decade+ ago), but I think preserving that kind of functionality if it could be of use to someone is fine. And, hopefully, anyone using a goto
is aware of their reputation.
For the conditional break
and continue
, I find the versions using if
easier to read. It's also somewhat reminiscent of how those statements could look in Python as one-liners, e.g., if number_of_samples_left == 0: break
. I wonder, though, if it might be better to have separate keywords breakif
and continueif
. In particular, I'm wondering if someone may try to use an else
clause after break if X
. I'm not sure what semantics I would expect from that besides normal if-else
operation, and else
is superfluous after a break
or continue
.
There is also possibly an option similar to a common shortcut that is written in Julia, where short-circuiting in Boolean evaluation is taken advantage of to write numberOfSamplesLeft == 0 && break
. It could look like if numberOfSamplesLeft = 0 and break
in Praat, though perhaps that's too esoteric.
Yes, I had written break_if
and continue_if
into the previous comment before I decided that there was too much detail there, but these do seem to be clear as well as well as prevent some else
s. We don't use underscores much, but I guess they can be used for other compound keywords as well, as for instance for the parallel_for
that we are planning, or for error strategies (e.g. min(...)
giving an undefined
but min_e(...)
raising an error, if any argument is not a finite number).