Erotemic / xdoctest

A rewrite of Python's builtin doctest module (with pytest plugin integration) with AST instead of REGEX.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UnicodeDecodeError in pytest plugin active on html5lib repo

jayvdb opened this issue · comments

If the xdoctest plugin is active when running tests on html5lib, it fails on a .txt file which is incorrectly assumed to be utf-8.

_______________________________________________________________________________ ERROR collecting test_stream.py ________________________________________________________________________________
/usr/lib/python3.7/site-packages/pluggy/hooks.py:289: in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
/usr/lib/python3.7/site-packages/pluggy/manager.py:87: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
/usr/lib/python3.7/site-packages/pluggy/manager.py:81: in <lambda>
    firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
/usr/lib/python3.7/site-packages/_pytest/python.py:234: in pytest_pycollect_makeitem
    res = list(collector._genfunctions(name, obj))
/usr/lib/python3.7/site-packages/_pytest/python.py:403: in _genfunctions
    self.ihook.pytest_generate_tests.call_extra(methods, dict(metafunc=metafunc))
/usr/lib/python3.7/site-packages/pluggy/hooks.py:327: in call_extra
    return self(**kwargs)
/usr/lib/python3.7/site-packages/pluggy/hooks.py:289: in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
/usr/lib/python3.7/site-packages/pluggy/manager.py:87: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
/usr/lib/python3.7/site-packages/pluggy/manager.py:81: in <lambda>
    firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
/usr/lib/python3.7/site-packages/_pytest/python.py:129: in pytest_generate_tests
    metafunc.parametrize(*marker.args, **marker.kwargs)
/usr/lib/python3.7/site-packages/_pytest/python.py:964: in parametrize
    function_definition=self.definition,
/usr/lib/python3.7/site-packages/_pytest/mark/structures.py:121: in _for_parametrize
    len(param.values)
E   TypeError: object of type 'MarkDecorator' has no len()
___________________________________________________________________ ERROR collecting testdata/encoding/chardet/test_big5.txt ___________________________________________________________________
/usr/lib/python3.7/site-packages/xdoctest/plugin.py:191: in collect
    text = self.fspath.read_text(encoding)
/usr/lib/python3.7/site-packages/py/_path/common.py:165: in read_text
    return f.read()
codecs:322: in decode
    ???
E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 0: invalid start byte

The code that runs text file doctests is an almost verbatim copy of the original doctest plugin that is builtin to pytest. I'm curious if this works with the original doctest plugin, or if it runs into the same issue here: https://github.com/pytest-dev/pytest/blob/master/src/_pytest/doctest.py#L322

The corresponding line in xdoctest is https://github.com/Erotemic/xdoctest/blob/master/xdoctest/plugin.py#L191

A quick fix might be to change xdoctest_encoding from utf8 to something more suitable in your pytest.ini. (This would analogous to setting doctest_encoding for the original pytest plugin).

Can you point me to the text file that its failing on in html5lib? Is it a text file that should be read and parsed? Or is it a binary file that shouldn't have been parsed in the first place?

testdata/encoding/chardet/test_big5.txt

Thanks, I learned something new about encodings.

So, the issue is that this (and the original doctest pytest plugin) assumes that any file that matches the "xdoctestglob" pattern (which defaults to "test*.txt") is a text file that contains a doctest. However, the test_big5.txt file is not a doctest at all. Its a data file meant to test that documents using the "big5" encoding work properly.

So, there are a few ways we can resolve this issue. The first two can happen without any modifications to the xdoctest source code:

  • Add the option --xdoctest-glob '' to the command line or the pytest.ini file. This will disable any file from matching the textfile doctest.

  • Add xdoctest_encoding=big5 to the pytest.ini. This is definitely not the right thing to do because the file doesn't even contain a doctest. But it does prevent the error from happening (although I imagine it will cause other errors for non-big5 encoded files).

Then we can also change the xdoctest source.

  • We could set the xdoctest-glob patterns to be empty by default. I somewhat like this idea because I don't really like these doctest text files anyway. I just have them there to maintain backwards compatibility with the original doctest plugin, but on the other hand I do want to maintain backwards compatibility, so I don't want to make changes like this lightly.

  • We can automatically skip any file that fails to parse. We might even set add an option that will raise an error when there is an encoding issue, but we just keep it off by default. This would prevent the error, but it doesn't feel as clean as the previous "opt-in" solution.

I think that the 3rd option is best. I don't like how the current behavior just assumes that all "test*.txt" files contain doctests. In most projects I would wager they typically wont. I created PR #59 which implements this change. I'll wait a day or two before I decide to merge or not in order to hear out any other opinions on the matter.

This must be solved in xdoctest code, because this causes failures when xdoctest plugin exists even for projects which do not explicitly use xdoctest, and thus will not want to add xdoctest options to any config or command line.

Oh, that is very surprising. I wonder how the pytest.doctest avoids this given that its a builtin plugin. I don't see anything in the source that would prevent it from being the case for doctest, but have it be an issue for xdoctest, but then again pytest's architecture is not very transparent and relies on a deep understanding of its structure and mechanisms to make any sense of anything.

Anyway, this is a serious error, so I'll probably merge #59 and push a new release to pypi sometime tomorrow. Thanks for reporting it.

So, I verified that the original doctest plugin also has the behavior, even when xdoctest is not installed. I actually had to monkey patch out an additional doctest function, but now it seems to work.

Basic steps I took:

  • clone html5lib-tests

  • run pytest

  • got error in xdoctest

  • uninstall xdoctest

  • got error in pytest

Now that this issue is fixed in xdoctest, running pytest in html5lib-tests works when xdoctest is enabled, but it still fails in _pytest.doctest when xdoctest is disabled. (all the more reason to switch to xdoctest)