faif / python-patterns

A collection of design patterns/idioms in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Infrastructure] Testing outputs

gyermolenko opened this issue · comments

update on idea of testing outputs as docstrings (e.g. with python -m doctest):
I couldn't find easy way to generate docstrings for def main()

By "testing outputs" I mean to compare script output with ### OUTPUT ### section at the bottom of the file.
Structuring scripts like that:

def main():
    ...

if __name__ == "__main__":
   main()

allows imports of main in tests to evaluate and compare outputs.

It would also be more convenient to have ### OUTPUT ### section as variable or docstring, so we do not need to parse file for such comparisons

Here are example script/test files

#example script file

def main():
    print("abc")


if __name__ == "__main__":
    main()

OUTPUT = """
abc
"""
#example test (same for all scripts with the structure as above)

from contextlib import redirect_stdout
import io

from example_script_file import main, OUTPUT

def test_output():
    f = io.StringIO()
    with redirect_stdout(f):
        main()

    real_output = f.getvalue().strip()
    expected_output = OUTPUT.strip()
    assert real_output == expected_output

@faif Hi. Can you please share your thoughts on this?

Hi,

I like this idea since it makes the outputs unit-testable.

hey @faif
I've been thinking about it a lot lately, and I return to doctests again and again.
Examples are written once / read multiple times. That's why I strongly believe input/output rows should interleave, and not be in separate blocks.

Here is a basic example of before-after of this idea:

CURRENT (input and then output)

def main():
    template_function(get_text, to_save=True)
    print("-" * 30)
    template_function(get_pdf, converter=convert_to_text)
    print("-" * 30)
    template_function(get_csv, to_save=True)


if __name__ == "__main__":
    main()


OUTPUT = """
Got `plain-text`
Skip conversion
[SAVE]
`plain-text` was processed
------------------------------
Got `pdf`
[CONVERT]
`pdf as text` was processed
------------------------------
Got `csv`
Skip conversion
[SAVE]
`csv` was processed
"""

PROPOSED HERE (interleaved)

def main():
    """
    >>> template_function(get_text, to_save=True)
    Got `plain-text`
    Skip conversion
    [SAVE]
    `plain-text` was processed

    >>> template_function(get_pdf, converter=convert_to_text)
    Got `pdf`
    [CONVERT]
    `pdf as text` was processed

    >>> template_function(get_csv, to_save=True)
    Got `csv`
    Skip conversion
    [SAVE]
    `csv` was processed
    """

if __name__ == "__main__":
    import doctest
    doctest.testmod()

I think the second example it is much-much easier to read.

I took short and simple output and I do understand that some difficulties can occur with longer ones, but I believe that this is worth it.

As for testing, it is as simple as python script.py.
If we leave out if __name__ == "__main__": part we still can run it with python -m doctest script.py (or pytest --doctest-modules script.py )

As for auto-generated output vs human generated one.. auto-generated is not correct just because. It still needs review.
And as I said in the beginning, examples are written once, read multiple times. So the burden of writing doctest manually is worth the effort (imho)

I'm not very familiar with doctest so I'm not sure if it can replace all current functionality. You can pick a more complex use case and give it a try. I'm generally positive with the proposal.

I created #283 for illustration purposes.

Here are some things to point out:

  1. Doctest is a bit harder to write than "usual code" - because of indentation, ">>>/ ..." markers etc. But in case with most patterns it is not an issue at all.
  2. there are some things to remember when writing doctests, described in official docs .
    Here are tldr recommendations:
  • do not rely on dicts ordering in repr (before py3.6) (same for current outputs)
  • do not rely on user classes instances in repr - have object addresses like 0x23abc4 (same for current outputs)
  • tracebacks need to be shortened with "..." (shorthand for "everything you want"). I see it as improvement over current way.
  • module name (main / script_name ) , paths (abs / relative, with or w/o homedir) - depend on a way of running, from dev terminal or CI. Same for current situation, "..." workaround works

To update on this issue, what is left to do:

  • there are only 8 files left with an old-style ### OUTPUT ### (to be substituted with doctests)
  • remove append_output.sh
  • update Output section of readme

I think we're almost done with substituting all files with doctests!

The only one left is for abstract_factory.py which is ambiguous due to its random output for the shop = PetShop(random_animal) test case. We could use doctest.ELLIPSIS to check for random outputs (i.e. We have a lovely ... and It says ....), though this might not be the most accurate way to test as it doesn't ensure that Dog must go with woof and Cat must go with meow. Looking at the doctest documentation, I don't think there is any way to explicitly test for such cases (or not that I am aware of). Any thoughts?

The other thing left to do would be to update the contributing section of README.md on writing doctests for future patterns.

@gyermolenko got it and on it, thanks for the tip!

Can we close this @gyermolenko @alanwuha? I believe yes

Can we close this @gyermolenko @alanwuha? I believe yes

hey @faif , happy holidays!
Only abstract_factory.py is left and @alanwuha wanted to work on it I believe
Then to remove append_output.sh and update Output section of readme.