Print more information when displaying counterexamples

Question

Print more information when displaying counterexamples

sauclovian-g opened this issue 4 months ago · comments

In both the cases where an override spec fails to match, and when a top-level spec fails its postcondition, the information printed does not in general include enough information to figure out what went wrong.

SAW prints the input model for the top-level spec. Quite apart from this often itself not being useful (see #2049) it forces the user to manually derive or guess at values for other terms participating in the problem. For example, in the case of a failing override spec, the arguments to the overridden function are not printed. In a failing top-level spec, the output values of ghost variables are not printed. Furthermore, SAW does not indicate which precondition or postcondition assertions fail and which pass; the user must guess at this or determine it experimentally by commenting assertions out.

Meanwhile, trying to manually derive these values is at best error-prone, and difficult for anyone not well versed in formal methods. Furthermore, there's a strong tendency for the value thus derived to be the value one expects, whether or not that is the value that actually appears; this denies the user the ability to notice inconsistencies between their expectations and actuality. This in turn makes debugging specs vastly harder.

SAW should really print all of the following besides the initial input model:

The values of all arguments to a failing override
The input values of all ghost state used by a failing override
Which pertinent assertions (preconditions of an override or postconditions of a top-level spec) are false
The argument values and return value of at least the previously applied override and ideally all overrides used
The input and output values of all ghost state used by at least the previously applied override and ideally all overrides used

Furthermore, ideally it should print values for all the bound variables of the failing spec, and maybe also of the last applied override. This might end up being a lot of duplicative material, depending on the internal representation and the complexity of the operations that bind those variables; it might be worth trying to filter based on that complexity.

Note that extracting all this material is likely not trivial (at a minimum it means re-executing a bunch of stuff and it might require further solver runs to generate example values) ... but this is what you need to diagnose bugs in complex specs and models, especially in a noninteractive context where you can't step through the execution like you can in a proof assistant.