Test reporting can cause test code to run twice

Question

Test reporting can cause test code to run twice

gfredericks opened this issue 6 years ago · comments

Some code in the ultra.test.logic namespace can cause test code to be run twice, by inserting it unquoted two times in the macroexpansion.

For example this test prints :foo two times:

(deftest a-test
  (is (not (println :foo))))

Note that the not there is important, as this behavior is (I assume) only triggered by the call to one of the three logic-ops mentioned in ultra.test.logic.

I observed that changing line 16 of that file to (list 'quote form) seems to fix it, but I haven't internalized the logic in that namespace enough to convince myself that that's the correct fix.

David Jarvis · Answer 1 · Fri Feb 16 2018 21:21:47 GMT+0800 (China Standard Time)

@timothypratley wrote the logic in that namespace and I'll confess to not being quite as familiar with it. That having been said your change seems like a reasonable one and as far as I can tell doesn't break anything, so I'm happy to just make the change for now and if it causes any issues we can back it out later.

Gary Fredericks · Answer 2 · Fri Feb 16 2018 21:24:54 GMT+0800 (China Standard Time)

Sounds good, thanks!

David Jarvis · Answer 3 · Fri Feb 16 2018 21:42:32 GMT+0800 (China Standard Time)

Re-opening - unfortunately, while your fix resolves the issue it also stops the logic there from functioning properly. I think we'll need @timothypratley to take a closer look at this and see what can be done.

The bottom line here is that repeated evaluation of a form that triggers side effects is obviously unacceptable in tests. So if we can't fix this we'll need to disable the logic evaluator because it'll introduce wonky behavior to people's tests.

David Jarvis · Answer 4 · Fri Feb 16 2018 21:44:24 GMT+0800 (China Standard Time)

Ah bollocks, GitHub optimistically closed this when I reverted the original commit. Sigh.

Gary Fredericks · Answer 5 · Fri Feb 16 2018 23:13:07 GMT+0800 (China Standard Time)

Does the change actually break baseline clojure.test functionality, or merely disable some of the ultra features?

David Jarvis · Answer 6 · Fri Feb 16 2018 23:16:41 GMT+0800 (China Standard Time)

I don't really think either of those is really correct.

Your proposed change causes the logic evaluator to fail and to just report the quoted result. Try running Ultra's demonstrative tests with lein test :demo with and without your change to see what I mean.

That said, the logic evaluator's current behavior of triggering side effects multiple times is not acceptable. So it needs to be fixed because that will end up causing a headache for people.

Timothy Pratley · Answer 7 · Mon Feb 19 2018 10:06:31 GMT+0800 (China Standard Time)

I'll take a look into it :)

Timothy Pratley · Answer 8 · Mon Feb 19 2018 11:07:29 GMT+0800 (China Standard Time)

TLDR: I think we should remove ultra.test.logic. I'll submit a pull request to that effect shortly.

Sorry for headaches this caused you @gfredericks I can imagine if you are posting about it here it must have been trick to identify and causing some weird pains.

The feature provided by ultra.test.logic relies explicitly on re-evaluation of the leaves of logic branches.

To recap briefly on the purpose of of the ultra.test.logic namespace:

Instead of opaque failures:
expected: (and (:name pirate) (or (empty? pirate) (:age pirate)))
actual: nil

See what part of the logical expression does not meet your expectation:
expected: (and (:name pirate) (or (empty? pirate) (:age pirate)))
actual: (not (and "Edward Teach" (or false nil)))

This output is very useful for understanding why the test failed.

But it cannot be produced without evaluating the inner parts of the logic expressions. The test must evaluate the expression to nil, and the "help annotator" must evaluate the non-logic portions individually while preserving the logic structure.

In this particular example, (:name pirate) is evaluated to "Edward Teach", (empty? pirate) is evaluated to false and (:age pirate) is evaluated to nil, to construct the help message: (not (and "Edward Teach" (or false nil))). This clearly shows that the (:age pirate) is the branch of logic causing the test to fail. Quite helpful for understanding why the test fails, but not achievable without evaluation. Given that this feature is not possible to provide without this behavior I recommend removing the ultra.test.logic features.

Timothy Pratley · Answer 9 · Mon Feb 19 2018 11:22:40 GMT+0800 (China Standard Time)

Gosh, as soon as I wrote that message I realized that there is a perfectly reasonable way to make the feature work, and not double evaluate. Instead of evaluating the result up front, the leaves of the logic branch can be evaluated as part of creating the result.

Current (unacceptable) implementation:
calculate result: evaluate overall expression
build helpful message: traverse expression evaluating the non-logic parts

Possible implementation:

traverse logic expression evaluating the non-logic parts
preserve the logic tree for the helpful message if need
evaluate the logic tree (that already has evaluated non-logic parts)... no additional evaluation

Hmmm

Timothy Pratley · Answer 10 · Tue Feb 20 2018 03:30:09 GMT+0800 (China Standard Time)

Unfortunately I don't see a way to handle or.

I can imagine resolving all the inner expressions in a let bindings,
which suffices for and

(and (= 1 2) :b)
(let [a (= 1 2), b :b, result (and a b), help (list 'and a b)] ...)

but for or we would need to avoid binding the inner part unless it was needed:

(or true (side-effect))

cannot be represented as

(let [a true, b (side-effect)] (or a b))

because (side-effect) will occur.

So I'm back to we need to remove it :)

Gary Fredericks · Answer 11 · Tue Feb 20 2018 04:12:44 GMT+0800 (China Standard Time)

could delay be helpful here?

David Jarvis · Answer 12 · Tue Feb 20 2018 06:22:36 GMT+0800 (China Standard Time)

delay isn't a terrible idea. The obvious thing to me is that these tests obviously do trigger the side effects when they're run, and that we're in a position where we can capture the pre-evaluated test forms as well as the ultimately evaluated test forms. So as far as I can tell there's no theoretical reason why this isn't possible, but (a) there may be quite a bit of work involved and (b) it's possible that there will be performance implications. Now if we want to pull the namespace first and then figure out how to resolve those issues that's fine, but I'm also okay with us working on them in a more iterative fashion - after all, at least so far nobody has showed up to complain about multiple evaluation of side effects.

Timothy Pratley · Answer 13 · Tue Feb 20 2018 10:14:00 GMT+0800 (China Standard Time)

Yup; I've opened #85 to temporarily disable the bad behavior... I will try to cook up a proper solution maybe later this week. I agree delay seems like a good idea. Another idea it prompted is that I can make the chained let approach work simply by doing a conditional check whether the previous binding would have short-circuited:

(let [a# first-or-expression, b# (if a# :not-evaluated second-or-expression)] ...)

I think this might be pretty straightforward, but wont get a chance to play around with it until later in the week.

David Jarvis · Answer 14 · Wed Feb 28 2018 02:16:21 GMT+0800 (China Standard Time)

I'm going to close this issue as we're pulling the feature for the time being which will get rid of the double-evaluation issue. This behavior will be gone in the next release.