Maximum coverage in minimal time

Question

Maximum coverage in minimal time

masaccio opened this issue a year ago · comments

Summary

Given a project where tests have been added incrementally over time and there is a significant amount of overlap between tests,
I'd like to be able to generate a list of tests that creates maximum coverage in minimal time. Clearly this is a pure coverage approach and doesn't guarantee that functional coverage is maintained, but this could be a good approach to identifying redundant tests.

I have a quick proof-of-concept that's not integrated into pytest that:

runs all tests with pytest-cov and --durations=0
processes CoverageData and the output of --durations=0 to generate a list of arcs/lines that are covered for each context
reduces the list of subsets using the set cover algorithm
optionally applies a coverage 'confidence' in the event you want a faster smoke test that has reduced coverage (say 95%).

I am happy to work on a PR and include tests, but before I do I wanted to gauge fit to your project's goals and if you'd rather not have this feature, I can always create a separate plugin for people who want it.

Ned Batchelder · Answer 1 · Thu Aug 10 2023 22:28:24 GMT+0800 (China Standard Time)

It sounds interesting! Do you have a link to your proof of concept?

Jon Connell · Answer 2 · Thu Aug 10 2023 23:28:11 GMT+0800 (China Standard Time)

Sure: pretty basic code in this gist and a lash up in terms of integration, but tells me this is worth looking at for a broader set of packages using pytest.

But this works as expected on the tests I was trying to optimise:

#  Reports 100% coverage from 140 tests:
poetry run pytest --cov=src/numbers_parser \
                              --cov-report=term-missing:skip-covered \
                              --cov-context=test \
                              --durations=0 -n logical | tee duration.txt
# Generate the cover set
poetry run python3  maxcov.py > maxcov.txt
#  Reports 100% coverage from 91 tests:
poetry run pytest --cov=src/numbers_parser \
                              --cov-report=term-missing:skip-covered \
                              --cov-context=test \
                              -n logical `cat maxcov.txt`

Runtime is a bit over half as long, which isn't very long for this package anyway, but still.

Ned Batchelder · Answer 3 · Sat Aug 12 2023 06:42:01 GMT+0800 (China Standard Time)

I think this is very interesting! I don't see a reason to add it into pytest-cov though: it operates after the entire test run. If you package it independently, it can be used by people who don't use pytest-cov.

I tried running the code and ran into a few issues (needed -vvv to get all the durations, splitting the lines needed maxsplit=2 because my parameterized test names sometimes have spaces in them, I was using a data file other than .coverage). In the end, my output was "none", though I'm not sure if I wasn't capturing the contexts correctly.

Jon Connell · Answer 4 · Mon Sep 25 2023 02:47:21 GMT+0800 (China Standard Time)

Will close this issue since I have started development on https://pypi.org/project/pytest-maxcov/. I need to unpick how well this can work with contexts given measurement needs --cov-context=test .

@nedbat can you think of a reason why running pytest-cov in a subprocess in a different directory would interfere with coverage in the parent process? I am seeing pretty much zero coverage in the plugin's pytest run and I'm wondering whether .corverage is getting clobbered.

Ned Batchelder · Answer 5 · Tue Sep 26 2023 02:01:11 GMT+0800 (China Standard Time)

@masaccio I'm interested to see where maxcov goes. I'm sorry, but I don't know enough about the internals of pytest-cov to know whether there's interference like you might be seeing.