Erotemic / xdoctest

A rewrite of Python's builtin doctest module (with pytest plugin integration) with AST instead of REGEX.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ignore \r\n vs \n in output

jayvdb opened this issue · comments

When capturing output of a CLI, and sometimes when dealing with bytes, it is common to have a \r\n appear in a variable only if it is on Windows, and \n otherwise.

The following should interpret \n in the expected output as a os.linesep, or to be sensible/compatible with be doctest it should be liberal and match any linesep.

>>> 'foo\r\n'
'foo\n'

Or for something completely different, allow expressions in the expected output

>>> 'foo\r\n'
'foo' + os.linesep

Is your first case an example of backwards incompatibility? When I tried it, it seemed to fail in doctest as well. I added this function to dev/backwards_incompatiblity_examples_inthewild.py and ran both command lines in the docstring:

def linestep():
    r"""
    CommandLine:
        xdoctest -m ~/code/xdoctest/dev/backwards_incompatiblity_examples_inthewild.py linestep

        python -m doctest ~/code/xdoctest/dev/backwards_incompatiblity_examples_inthewild.py linestep

    >>> 'foo\r\n'
    'foo\n'
    """

In both cases they failed. If this is a backwards compatibility issue, I'm inclined to fix it, but otherwise I think handling these cases is outside the scope of xdoctest.

Normally I would want to be biased towards the more permissive policy, but in this case we aren't actually checking the line feeds, we are checking the representation of the line feeds. My feeling is that this might be too much of a special case.

On your second idea, I can answer that with a definitive no. Adding expressions in the "want" part of a "got/want" block is opening a huge can of worms. It takes a system that's well defined and makes it ambiguous. How do you differentiate between literal statements and expression statements? You can't.

However, using expressions can be done using a simple assert. I would recommend doing this to address your use case:

>>> assert 'foo\r\n' == 'foo' + os.linesep

Alternatively you could do

>>> 'foo\r\n'
'foo...\n'

It isnt backwards compatibility -- the problem also occurs in doctest.

The two examples you give do solve the problem, but reduce the readability of the doctest. The later is unacceptable because the resulting string needs to be shown almost exactly for the user to understand what is happening.

You are right about it being the representation, and that being why few people encounter or care about it.

What about adding a new directive +IGNORE_EOL_REPRESENTATION?

I would have thought this problem was similar to string u prefixes and the unicode directives that were spawned by it.

This isnt a high priority, as we already have a workaround, foo.replace('\r\n', '\n'), and can solve this internally with bskinn/stdio-mgr#80

  >>> 'foo\r\n'
  'foo...\n'

But this want would also match a result of foobarbazquuzmooooooo\n. Unless xdoctest's implementation behaves differently, doctest.ELLIPSIS is more non-selective a hammer than is desirable here.

What about adding a new directive +IGNORE_EOL_REPRESENTATION?

I'm somewhat against adding new directives when possible. My view on them is that they are sometimes necessary, but they're often too clunky, especially when it comes to modifying the got/want string behavior.

I would have thought this problem was similar to string u prefixes and the unicode directives that were spawned by it.

It is, but I'd really like to remove that code after Python 2 is EOL. I'd like to not add more bug-prone heuristics like that if possible. I've actually been bitten by the b prefix before in the test:

    >>> from pprint import pprint as pp  # ensuring proper key ordering
    >>> omd = {'a': 3, 'b': 2}
    >>> pp(dict(omd))

Due to the way these checks are implemented, when I had a broken "want" string in that doctest it reports the expected string as: {'a': 3, '': 2}. Unfortunately xdoctest is stuck with this quirk for the time being, but I'd like to avoid adding more if possible.

@bskinn, you're correct the ellipsis is highly permissive, which is why the assert-based solution was my first recommendation. My philosophy on the "got/want" strings is that they are primarily there for documentation purposes and we can check they are mostly ok, but they aren't meant to be robust tests. If you want a robust tests you should use real code, not a heuristic heavy matching check.

I'm going to close this issue because you have a workaround.