pypa / pip

The Python package installer

Home Page:https://pip.pypa.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pip install fails when PEP658 metadata is present if the user tries to install using a non-normalized package name

chriskuehl opened this issue · comments

Description

When installing packages from a registry which supports PEP658, pip install produces an error if the user tries to install a package using its non-normalized name.

For example, testing here with the fluffy-server package using an underscore instead of a hyphen:

$ pip install -i http://localhost:7777/simple fluffy_server 
Looking in indexes: http://localhost:7777/simple
Collecting fluffy_server
  Obtaining dependency information for fluffy_server from http://localhost:7777/packages/fluffy_server-1.39.2-py3-none-any.whl.metadata
  Downloading http://localhost:7777/packages/fluffy_server-1.39.2-py3-none-any.whl.metadata (727 bytes)
Discarding http://localhost:7777/packages/fluffy_server-1.39.2-py3-none-any.whl (from http://localhost:7777/simple/fluffy-server/): Requested fluffy_server from http://localhost:7777/packages/fluffy_server-1.39.2-py3-none-any.whl has inconsistent Name: expected 'fluffy_server', but metadata has 'fluffy-server'
ERROR: Could not find a version that satisfies the requirement fluffy_server (from versions: 1.39.2)
ERROR: No matching distribution found for fluffy_server

Package name variants like fluffy_server, FLUFFY-SERVER, etc. all fail. Only the normalized name fluffy-server succeeds.

Expected behavior

pip should allow installing using the non-normalized package name as it will be very confusing for users otherwise.

pip version

Tested with 23.1.2 and current main

Python version

3.11

OS

Linux (debian bullseye)

How to Reproduce

I have created a minimal reproduction PEP503 static file registry here which you can easily test with: https://github.com/chriskuehl/pip-pep658-normalization-bug-repro

  1. Clone the repo
  2. Start a static file server at the repo root, e.g. python3 -m http.server 7777
  3. Try to install the package with pip install -i http://localhost:7777/simple fluffy_server and observe the error from above.

This can also be reproduced using pypi.org, however you need to use the PEP503 (HTML /simple/ files) rather than the new JSON format. I could not find a built-in option to ask pip to use the HTML index, but if you comment out these two lines and force pip to use text/html, you can see that pip discards the wheel and installs from the sdist instead:

$ pip install fluffy_server                                                                                       
Collecting fluffy_server
  Obtaining dependency information for fluffy_server from https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl.metadata
  Downloading fluffy_server-1.39.2-py3-none-any.whl.metadata (727 bytes)
Discarding https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl#sha256=cfa5516c13c98b17898348c7f12c6953c01965c8fdec690b70b0fab4936949c5 (from https://pypi.org/simple/fluffy-server/) (requires-python:>=3.10): Requested fluffy_server from https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl#sha256=cfa5516c13c98b17898348c7f12c6953c01965c8fdec690b70b0fab4936949c5 has inconsistent Name: expected 'fluffy_server', but metadata has 'fluffy-server'
  Downloading fluffy-server-1.39.2.tar.gz (97 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.3/97.3 kB 1.3 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done

Code of Conduct

Hmm, I would have expected pip to always send the wheel url with .metadata appended, which is what PEP 658 specifies, and which will work unless the server has inconsistent filenames. I can’t tell for certain (I’m on mobile right now) but I suspect PyPI isn’t following that rule (wheel file name normalisation uses underscores rather than hyphens, so maybe that’s the problem). If my suspicion is right, this will need reporting to the Warehouse project. And your local server will need fixing as well.

Hmm, I would have expected pip to always send the wheel url with .metadata appended, which is what PEP 658 specifies, and which will work unless the server has inconsistent filenames.

I believe it is. These are the requests I see:

127.0.0.1 - - [20/May/2023 03:16:21] "GET /simple/fluffy-server/ HTTP/1.1" 200 -
127.0.0.1 - - [20/May/2023 03:16:21] "GET /packages/fluffy_server-1.39.2-py3-none-any.whl.metadata HTTP/1.1" 200 -

It makes these same requests no matter how I enter the package name, so I believe it is requesting the correct files.

I think the problem is that when it downloads and parses the metadata, it compares the name parsed from the metadata file (which is normalized) directly to the name the user entered without normalizing it.

Perhaps a change like this:

--- src/pip/_internal/operations/prepare.py     2023-05-20 03:23:56.269263381 -0500
+++ src/pip/_internal/operations/prepare.py     2023-05-20 03:23:52.665267870 -0500
@@ -410,7 +410,7 @@
         #     NB: raw_name will fall back to the name from the install requirement if
         #     the Name: field is not present, but it's noted in the raw_name docstring
         #     that that should NEVER happen anyway.
-        if metadata_dist.raw_name != req.req.name:
+        if metadata_dist.canonical_name != canonicalize_name(req.req.name):
             raise MetadataInconsistent(
                 req, "Name", req.req.name, metadata_dist.raw_name
             )

Seems to fix it for me but I'm not sure if there are any edge cases.

The fact that the comment above the line suggests that raw_name was chosen specifically (and mentions that it falls back to canonical_name which they are trying to avoid for some reason) makes me think there may be additional concerns I'm not aware of.

Ah yes - thanks for the analysis! I'll take a closer look at the code, but IMO your suggested fix looks right, so if there are any additional concerns, I'd want to fix them another way (or maybe just not worry about them unless someone reports them in a real-world situation 🙂)

I tracked down a comment about name verification from the PR that introduced this code which has some context: #11111 (comment)

If I'm understanding correctly, I think they didn't want to use metadata_dist.canonical_name directly since this isn't populated by parsing the METADATA file, but instead passed in when constructing the object:
https://github.com/pypa/pip/pull/11111/files#diff-dc6d0ea33ced90d3436f4edb226bbccebdddf1bdf8a0c91b2cc178546198769eR113-R128

Since the goal is to verify the METADATA file, it makes a lot of sense to not use it and instead want to use raw_name which does come from the METADATA file.

So perhaps it makes sense to still use raw_name but canonicalize it?

--- src/pip/_internal/operations/prepare.py     2023-05-20 03:23:56.269263381 -0500
+++ src/pip/_internal/operations/prepare.py     2023-05-20 03:23:52.665267870 -0500
@@ -410,7 +410,7 @@
         #     NB: raw_name will fall back to the name from the install requirement if
         #     the Name: field is not present, but it's noted in the raw_name docstring
         #     that that should NEVER happen anyway.
-        if metadata_dist.raw_name != req.req.name:
+        if canonicalize_name(metadata_dist.raw_name) != canonicalize_name(req.req.name):
             raise MetadataInconsistent(
                 req, "Name", req.req.name, metadata_dist.raw_name
             )

(Not sure if it's necessary to canonicalize the raw_name or not. In my case it is already canonicalized in the metadata but maybe some wheels have it not canonicalized for whatever reason?)

So perhaps it makes sense to still use raw_name but canonicalize it?

Yes, that's precisely the conclusion I'd just come to from looking at the code, so what you found on the tracker makes perfect sense to me. I think it's right to canonicalize explicitly, as the spec doesn't require the name in the metadata file to be normalised, and actually recommends normalising before comparison.

Would you be interested in making this fix into a PR? What you posted above plus a short changelog would be sufficient - it would be nice to add a test if you can work out how to trigger the behaviour from a test, but if not, I'd be happy to look into that. Alternatively, I'll make a PR, but I don't want to take the credit if you'd like to do it 🙂

One thought - why isn't this reproducible if you use the JSON index from PyPI? That should work the same. If it's not failing in the same way, that suggests the JSON index isn't hitting the PEP 658 fastpath in the first place (because the files will be the same either way).

Sounds good, I'll try to get a PR up later today. Will also check what's happening with the JSON index, I haven't looked into it all other than noticing that it doesn't seem to be downloading the metadata files in that mode.

Cool - it would be a shame if the new JSON form of the index is disabling the new separate metadata file 🙁 If fixing that is more complex, though, it can be split into a separate PR if you'd prefer to do that.

The reason this doesn't happen with the JSON index is because pip is looking for a dist-info-metadata key while pypi.org provides a data-dist-info-metadata key:

$ curl -H 'Accept: application/vnd.pypi.simple.v1+json' https://pypi.org/simple/fluffy-server/ | jq .files
[...]
  {
    "data-dist-info-metadata": {
      "sha256": "4db99543165cbdeef42ccb6257545911ccd7865d65e304e3e056f383a25f309c"
    },
    "filename": "fluffy_server-1.39.2-py3-none-any.whl",
    "hashes": {
      "sha256": "cfa5516c13c98b17898348c7f12c6953c01965c8fdec690b70b0fab4936949c5"
    },
    "requires-python": ">=3.10",
    "size": 119321,
    "upload-time": "2023-05-14T04:36:58.299962Z",
    "url": "https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl",
    "yanked": false
  }
[...]

I think pypi.org is providing the wrong key name as it should not be prefixed with data- according to PEP691: https://peps.python.org/pep-0691/#project-detail

If I temporarily patch pip to use the (incorrect) key name for the PyPI index, I get a traceback raised:

Traceback (most recent call last):
  File "/home/ckuehl/proj/pip/src/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/cli/req_command.py", line 248, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
                      ^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
    result = self._result = resolver.resolve(
                            ^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
  File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/structs.py", line 156, in __bool__
    return bool(self._sequence)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
           ^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
    candidate = func()
                ^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
                                       ^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 293, in __init__
    super().__init__(
  File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
                ^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
    dist = self._prepare_distribution()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/operations/prepare.py", line 510, in prepare_linked_requirement
    metadata_dist = self._fetch_metadata_only(req)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/operations/prepare.py", line 374, in _fetch_metadata_only
    return self._fetch_metadata_using_link_data_attr(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/operations/prepare.py", line 384, in _fetch_metadata_using_link_data_attr
    metadata_link = req.link.metadata_link()
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/models/link.py", line 421, in metadata_link
    metadata_link_hash = LinkHash.parse_pep658_hash(self.dist_info_metadata)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ckuehl/proj/pip/src/pip/_internal/models/link.py", line 79, in parse_pep658_hash
    name, sep, value = dist_info_metadata.partition("=")
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'partition'

I think there is a bug on both sides:

If I further patch pip to parse the dict correctly, it reproduces the original error from this issue when using the JSON index:

$ pip install --no-cache --no-deps --only-binary :all: fluffy_server==1.39.2
Collecting fluffy_server==1.39.2
  Obtaining dependency information for fluffy_server==1.39.2 from https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl.metadata
  Downloading fluffy_server-1.39.2-py3-none-any.whl.metadata (727 bytes)
Discarding https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl (from https://pypi.org/simple/fluffy-server/) (requires-python:>=3.10): Requested fluffy_server==1.39.2 from https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl has inconsistent Name: expected 'fluffy_server', but metadata has 'fluffy-server'
ERROR: Could not find a version that satisfies the requirement fluffy_server==1.39.2 (from versions: 1.0.0, 1.1.0, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.5.0, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.11.0, 1.11.1, 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.15.1, 1.15.2, 1.15.3, 1.15.4, 1.16.0, 1.16.1, 1.17.0, 1.18.0, 1.19.0, 1.19.1, 1.19.2, 1.20.0, 1.20.1, 1.20.2, 1.20.3, 1.21.0, 1.22.0, 1.23.0, 1.23.1, 1.24.0, 1.24.1, 1.24.2, 1.25.0, 1.26.0, 1.27.0, 1.28.0, 1.29.0, 1.29.1, 1.30.0, 1.31.0, 1.32.0, 1.33.0, 1.33.1, 1.33.2, 1.33.3, 1.33.4, 1.34.0, 1.35.0, 1.36.0, 1.36.1, 1.37.0, 1.38.0, 1.39.0, 1.39.1, 1.39.2)
ERROR: No matching distribution found for fluffy_server==1.39.2

I will go file separate tickets for the pip and warehouse issues discovered. I wonder if it will be hard for warehouse to fix the key name since I think it would immediately break a lot of existing pip versions if they did due to the parsing bug.

Here are the 2 bugs:

I will still try to create a PR for this issue now.