pypiserver / pypiserver

Minimal PyPI server for uploading & downloading packages with pip/easy_install

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

After I have upgraded pip to 22.0, pypiserver started to occur; HTML doctype missing or incorrect. Expected <!DOCTYPE html>.

lidyum opened this issue · comments

commented

After I have upgraded pip to 22.0, pypiserver started to occur the following error. (I use python 3.8)

ValueError: HTML doctype missing or incorrect. Expected <!DOCTYPE html>.

If you believe this error to be incorrect, try passing the command line option --use-deprecated=html5lib and please leave a comment on the pip issue at https://github.com/pypa/pip/issues/10825.

i get same issue to. As temporary solvation i comment

    def handle_starttag(self, tag: str, attrs: List[Tuple[str, Optional[str]]]) -> None:
        #if not self._seen_decl:
        #    self._raise_error()

in 408 line of venv/lib/python3.9/site-packages/pip/_internal/index/collector.py in pip

I can confirm same issue here. I am running a local pypi server and can not query packages any more starting with pip 22. Reason seems to be a change in the HTML parsing code used by pip (see pypa/pip#10825).

If I correctly understand the discussion, pip now rejects HTML documents which are not valid HTML5. According to the specs (see PEP 503: https://www.python.org/dev/peps/pep-0503/#specification) this is a valid expectation.

I have just spun up a pypiserver instance and it appears that the generated pages omit the doctype declaration entirely. It's present on the welcome page, but not on /simple or the project pages. It's an easy fix, though.

Thanks for that instant fix 🙂.

After a fix is released for this, does anyone have any idea of how should we update our indexes? Will it be needed to download all our packages again to regenrate it, or is there another way? I'm not too knowledgeable on how pip works.

Maybe I just don't get the point... but I am not sure what exactly you are referring to.

  • Updating the local indexes served by your pypiserver?
  • Updating python packages on client side?
  • Something different?

Maybe I just don't get the point... but I am not sure what exactly you are referring to.

* Updating the local indexes served by your pypiserver?

* Updating python packages on client side?

* Something different?

First option! I mean, even after upgrading pypiserver (after a fix is provided), the old HTML documents without the doctype directive will remain in my indexes, won't they? Or does pypiserver take care of that?

commented

I think problem has been resolved because pip has been updated to 22.0.1 ??

pip changelog

change logs 22.0.1 (2022-01-30)
Bug Fixes

  • Accept lowercase <!doctype html> on index pages. (#10844)

  • Properly handle links parsed by html5lib, when using `--use-deprecated=html5lib. (#10846)

Pip 22.0.1 does not "fix" the breaking change introduced by version 22.0.0. It still rejects HTML which does not contain DOCTYPE at all. However, version 22.0.1 does also except a lower case doctype, which 22.0.0 did not.

In contrast to that, pip 22.0.2 reverts the original breaking change and issues just a warning instead of rejecting pages which do not comply with the HTML5 specs.

However, I want to point out here once again (as in depth discussed in pypa/pip/issues/10825) that the actual origin of the issue is not pip's way of interpreting HTML, but rather the more or less lax way various pypi server implementations (not) complying with the standards.

Luckily, #413 addresses that with respect to pypiserver.

As far as I understand, the latest version of pip (22.0.3) will no longer error-out if it detects invalid HTML. Instead it will only emit a warning.

Just a heads up, this message dissapears when upgrading from 22.0.2 to the latest version (24.0 as of writing this).

For anyone that still might encounter this issue, I recommend updating your pypiserver to the latest version and also update pip to the latest version (pip install --upgrade pip)