Support shuffling / randomized test runs

Question

Support shuffling / randomized test runs

twhiting opened this issue 4 months ago · comments

Checklist

Feature request has a meaningful title
I have searched the existing issues. See all issues
I have tested using the latest version of Pester. See Installation and update guide.

Summary of the feature request

Google test has a feature where tests can be shuffled randomly per test run. This is beneficial in integration tests scenarios when one test might fail as a result of another previously ran test. Frankly it helps find obscure bugs.

How should it work?

See google test shuffle for a good description: http://google.github.io/googletest/advanced.html#shuffling-the-tests

Jakub Jareš · Answer 1 · Sun Mar 24 2024 23:48:25 GMT+0800 (China Standard Time)

Interesting. That would a bit difficult (and inefficient) to provide, because Pester relies on execution that is "top-down", so if we mix tests that come from 2 different files we have to "restore" the state for each test. And run all setups and teardowns again, breaking some of the life cycles that people probably rely on. This would be specially painful for mocks I think.

I am not against it completely, I would just like a bit more data showing why doing this is worth it.

Frode Flaten · Answer 2 · Mon Mar 25 2024 03:39:55 GMT+0800 (China Standard Time)

Yes it would be limited to shuffling tests (It) inside the current block (Context/Describe). Is that enough to be useful?

Shuffling containers/files is also an option, though that can already be achieved manually.

twhiting · Answer 3 · Mon Mar 25 2024 03:48:43 GMT+0800 (China Standard Time)

Yes google test shuffles within a "test suite" which is the behavior that @fflaten describes.

twhiting · Answer 4 · Mon Mar 25 2024 03:52:00 GMT+0800 (China Standard Time)

It is hard to explain why it's worth it without being super abstract. I for example have a powershell module that interfaces with a windows service. It is possible that test C can cause a state within the service that test A exposes. It is VERY hard to track down these cases without a reliable randomizer.

The second part of this is needing to be able to run the tests again in the same shuffled order.

Google test can shuffle based on a seed. So say in CI a test fails when shuffled with seed X. I can then manually trigger a new run with seed X and get the same exact test run order for all tests.

twhiting · Answer 5 · Mon Mar 25 2024 03:53:16 GMT+0800 (China Standard Time)

For a good description of the seed mechanism see the google test link I posted above.

Jakub Jareš · Answer 6 · Mon Mar 25 2024 17:31:23 GMT+0800 (China Standard Time)

Makes sense. If we randomize within the same container, in a top-down way (that is we randomize on each level, but dont jump up and down), then it is hardly any change from the current way of running tests, all you need to do is shuffle the tests in the discovered tree (in a deterministic way). I think there is even an “order” list already in the discovered tree.

@twhiting do you want to make a PR for this? Even if its just a proof of concept.

twhiting · Answer 7 · Sun Mar 31 2024 10:25:12 GMT+0800 (China Standard Time)

@nohwnd i'd love to get to the feature myself but realistically it'd be weeks/months. So if this is something you are interested in feel free to jump on it!

Thanks.

Miodrag Milić · Answer 8 · Tue Jun 04 2024 02:27:28 GMT+0800 (China Standard Time)

I came here to see if there is a flag to do this.

This is not hard to explain at all. I have a lot of experience with this with a big number of people of all types of seniority and thousands of pester tests. It's basically a norm, rather than exception, that people will depend on the tests within the same file. They may do it intentionally or by accident. Writing good tests is hard, for sure.

When tests execute in the same order, you are prone to behave like previous tests are seeding the environment or act like they are direct prerequisites of the subsequent tests. I have seen it all:

Test A creating an object, Test A+1 doing something with that object
Test A changing the fixture, test A+N depending on that change
Test A editing objects in fuzzy manner, test B failing by encountering such objects because of non-specific enough filtering

This almost always happens with the test cases within the same file, although it does happen across files too.

The one discovers this, usually by accident, when some tests are changed, skipped or removed, or simply by passage of time if tests are fuzzy in nature (my typical case). To fight some of this, I have Grafana dashboard of all previous runs, and I can select a particular test and see how often it failed during previous months.

Dashboard of screenshot with one suspicious test failure in last 30 days

This is IMO mandatory stuff to have. Randomization will either solve those problems or make them detectable quicker. I ask from devs that before commiting tests to master they provide proof of at least hundreeds of runs by using something similar to 1..100 | % { Invoke-Pester ... }, which would work much better with randomization included.