Erotemic / xdoctest

The most annoying thing about doctests is having to manually update them all when something changes the repr of your outputs, or a random seed, etc. If there was an xdoctest mode which wrote the output instead of comparing it, I'd switch over tomorrow 😀

Well, prepare to switch. If you don't specify something that you "want", then it wont compare it. Alternatively, if you really want to specify a "want" string, you can add # xdoctest: +IGNORE_WANT as a global directive (i.e. place it on its own line) to prevent the output checker from checking anything.

For example:

   >>> print('hi')

passes

   >>> print('hi')
   hello

fails

   >>> # xdoctest: +IGNORE_WANT
   >>> print('hi')
   hello

passes

There is one feature that doesn't currently exist that might make this even simpler. Currently xdoctest.directive.DEFAULT_RUNTIME_STATE statically initializes the defaults of all directives. I think it would be useful if a command line flag could make it such that the default for a directive (e.g. IGNORE_WANT), was always on. That would effectively correspond to the ignore mode you are talking about.

BTW, you can hack this by using xdoctest --global-exec="import xdoctest; xdoctest.directive.DEFAULT_RUNTIME_STATE['IGNORE_WANT'] = True" -m <your_modname>, which will run that line of code before every doctest.

Also note that in terms of dealing with minor repr issues is the reason I wrote ubelt.repr2, which strives to make a consistent and deterministic output between all python versions.

For random seeds, I strongly recommend using a rng = random.Random(0) or rng = np.random.RandomState(0) local manual seed, instead of the global random state. You might also look at kwarray.ensure_rng as an easy way to coerce either None, a number, or an existing random state into a local random state object.

I use these previous two techniques quite a bit, but when they fail I fall back on #xdoctest: +IGNORE_WANT

Imagine we have a doctest like the following:

>>> date.today().year
2020

This is fine for this year, but next January I would like to be able to run a command like xdoctest --update-inplace and have it edit my file to replace 2020 with 2021.

This is a contrived example, obviously, but I always want my example outputs to be checked (the opposite of +IGNORE_WANT), and have an option to update them in place instead for when I expect them to have changed.

Re: reprs and seeding

I was thinking more of intentional updates, such as adding a new attribute to an object.

I agree that seeding your PRNG is essential, but I still often need to change by doctest outputs when e.g. I change a function in some simulation code and it calls Random.xxx() a different number of times. Even with the same seed, small code changes can change the PRNG alignment and then the results look pretty different.

It's things like this, where even seeded random results are very very sensitive to code changes, which would make a zero-effort update command so useful 🙂 As-is, I've just had to avoid using doctests for any output that I expect might change because having example outputs in our docs just isn't worth the trouble.

So my feeling on intentional updates is that its a normal part of code maintenance. If you update the code in a way that modifies behavior or adds an additional attribute, it seems reasonable that part of that update would also involve making changes to tests. Either that, or tests could be written in such a way that they only check that expected components of the output exist, and allow new components to pass through. I've certainly encountered this problem before, but I haven't felt a dire need to automate this part of the development cycle.

I'm worried about two cases:

Incorrect output blindly being put in the code. Part of the reason why I like manually updating "want" strings when things change is it allows me as a maintainer to gut-check if that change is reasonable or not. I think saying "all my tests are correct, document the output" is a pretty unsafe thing to do.
Superfluous output in the code. I recently had a case that demonstrates this point.

My doctest:

        Example:
            >>> # xdoctest: +REQUIRES(module:ndsampler)
            >>> from netharn.metrics import DetectionMetrics
            >>> dmet = DetectionMetrics.demo(
            >>>     nimgs=10, nboxes=(0, 10), n_fp=(0, 1), n_fn=(0, 1), nclasses=3, cls_noise=.2)
            >>> cfsn_vecs = dmet.confusion_vectors()
            >>> cm = cfsn_vecs.confusion_matrix()
            >>> print(cm.to_string(float_format=lambda x: '%.2f' % x))
            pred        background  cat_1  cat_2  cat_3
            real
            background           0      1      1      1
            cat_1                2     12      0      1
            cat_2                2      0     14      1
            cat_3                1      0      1     17

failed because I had changed the output from integers to floats, so it wanted a bunch of .00 at the end of each number. However, I had also added extra loop reporting output because the function was taking a long time for larger inputs.

Therefore, if I had auto-changed the "want" string to what the doctest "got", I would have had this mess:

        Example:
            >>> # xdoctest: +REQUIRES(module:ndsampler)
            >>> from netharn.metrics import DetectionMetrics
            >>> dmet = DetectionMetrics.demo(
            >>>     nimgs=10, nboxes=(0, 10), n_fp=(0, 1), n_fn=(0, 1), nclasses=3, cls_noise=.2)
            >>> cfsn_vecs = dmet.confusion_vectors()
            >>> cm = cfsn_vecs.confusion_matrix()
            >>> print(cm.to_string(float_format=lambda x: '%.2f' % x))
            submit assign detection jobs 10/10... rate=55315.55 Hz, eta=0:00:00, total=0:00:00, wall=12:47 EST
            assign detections 10/10... rate=2743.98 Hz, eta=0:00:00, total=0:00:00, wall=12:47 EST
            ndarray convert 9/9... rate=53693.84 Hz, eta=0:00:00, total=0:00:00, wall=12:47 EST
            pred        background  cat_1  cat_2  cat_3
            real                                       
            background        0.00   1.00   1.00   1.00
            cat_1             2.00  12.00   0.00   1.00
            cat_2             2.00   0.00  14.00   1.00
            cat_3             1.00   0.00   1.00  17.00

(Also the doctest would also fail again, because a timestamp appeared in the "want" string)

Instead of the desired:

        Example:
            >>> # xdoctest: +REQUIRES(module:ndsampler)
            >>> from netharn.metrics import DetectionMetrics
            >>> dmet = DetectionMetrics.demo(
            >>>     nimgs=10, nboxes=(0, 10), n_fp=(0, 1), n_fn=(0, 1), nclasses=3, cls_noise=.2)
            >>> cfsn_vecs = dmet.confusion_vectors()
            >>> cm = cfsn_vecs.confusion_matrix()
            ...
            >>> print(cm.to_string(float_format=lambda x: '%.2f' % x))
            pred        background  cat_1  cat_2  cat_3
            real
            background        0.00   1.00   1.00   1.00
            cat_1             2.00  12.00   0.00   1.00
            cat_2             2.00   0.00  14.00   1.00
            cat_3             1.00   0.00   1.00  17.00

Note that xdoctest's got/want check only verifies the trailing outputs lines, so the ... isn't really needed.

That being said, its a reasonable use case, and code that writes code is something I'm generally into. I'm not against giving the user the tools to shoot themselves in the foot, although I would want to clearly document that something like this would have that capability. I don't think I would want to implement this myself, but I'd probably accept a PR if you wanted to take a shot at it.

Each DocTest object (defined on xdoctest.doctest_example.DocTest), knows the file (self.modpath) and line number (self.lineno) that the doctest beings on, and contains a list of xdoctest.doctest_part.DoctestPart objects (self._parts). Each DoctestPart contains exec_lines and want_lines, so it should possible to determine the start and ending line number of every "want" statement you would like to replace. The best way to do this would likely add some code in xdoctest.runner.doctest_module, which after all doctests are finished running, looks at the "failed" doctests (see run_summary['failed']), loops though those and builds a list of the files, line numbers, and "want" outputs for each doctest, and then goes through those files / line numbers in reverse order and updates the text in the files. This new features should have a CLI flag in xdoctest.__main__ that default to False, and the functionality should happen after _print_summary_report is called.

FYI, I'm warming up to this idea. I've experienced several cases where it would have been useful. I might implement it. I'll comment if I start on it, so anyone else who wants to have a go at it feel free to get to it before I do.

I'm actually very warm 🔥 on this idea. I actively want it. I've been fixing pytorch's doctests and having this would have saved a ton of time.

I hacked in an experimental feature that adds a +SKIP directive to every failing test. Handling this feature should only be slightly more involved than that. Would just need to find the lines corresponding to the "WANT" and replace them with the "GOT" string, which should be stored in the doctest's member variables.

xdoctest/src/xdoctest/runner.py

Line 341 in 6af2790

def _auto_disable_failing_tests_hook(context):

Feature request: run tests and write results as expected output