prefix-dev / rip

Solve and install Python packages quickly with rip (pip in Rust)

Home Page:https://prefix.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

rip backtracks too far back (performance issue)

notatallshaw opened this issue · comments

Environment: Linux Python 3.11

Command: cargo r -- apache-airflow[all]==2.8.1

Error:

2024-01-26T16:45:56.613739Z ERROR rattler_installs_packages::index::package_database: Error from source distributions 'apache-beam-2.42.0.zip' skipped: 
 could not build wheel: <string>:28: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
Traceback (most recent call last):
  File "/tmp/.tmpG0tRNU/build_frontend.py", line 124, in <module>
    get_requires_for_build_wheel(backend, work_dir)
  File "/tmp/.tmpG0tRNU/build_frontend.py", line 58, in get_requires_for_build_wheel
    result = f()
             ^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
    return self._get_build_requires(config_settings, requirements=['wheel'])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
    self.run_setup()
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 480, in run_setup
    super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 311, in run_setup
    exec(code, locals())
  File "<string>", line 99, in <module>
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 528, in get_distribution
    dist = get_provider(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 400, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 968, in require
    needed = self.resolve(parse_requirements(requirements))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 829, in resolve
    dist = self._resolve_dist(
           ^^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 870, in _resolve_dist
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pip' distribution was not found and is required by the application

  × Could not solve for requested requirements
  ╰─▶ could not find metadata for any sdist or wheel for this package. No metadata could be extracted for the following available artifacts:
        - apache-beam-2.42.0.zip
  help: Probably an error during processing of source distributions. Please check the error message above.

Expected Behavior: The issue isn't the error itself, this package is too old to install for my environment.

Rather there is some performance characteristic that feels wrong with rips resoluition here. It should not be backtracking so far on apache-beam, and I had a previous case where it was going this with snowflake-connector-python. Pip does not need to backtrack so far back and is able to solve this requirement.

This might be very trickly to fix, such complicated resolution graphs are not easy to pick apart and understand why something happened.

Edit: I've updated the error with the latest output I get on main.

I don't have a good understand on the interaction between resolvo and rip, but I would strongly suggest if there are multiple packages to check to somehow guide resolvo to try packages which have a valid wheel for the current Python before trying to build sdists, it will result in better performance in a lot of situations.

We have the option to first try all wheels before resorting to sdists for a specific package but now that I think about this it would be nice if we could also instruct resolvo to first backtrack further and only if that fails to also include sdists. 🤔

It's something I've suggested for Pip (pypa/pip#12035), and I now have an idea in my head how one would implement it for Pip. But I have other resolution improvements I am working on first over there, so I can't give you any real world data on how much iut helps..

Other than sdists are slow, even when they're not causing build problems.

But its hard not to use them :)

But its hard not to use them :)

Well it's not that you avoid using them altogether. It's just that often when backtracking to older versions of a project and the next older version requires building an sdist instead of wheel it's likely that means the Python version you're on is too new for that old of a package on that project.

It hence makes sense to try backtracking on different projects, at least until you've exhausted all other equally likely backtracking opportunities.

At least that's my hypothesis.

I think this can be done if we allow DependencyProviders control over which requirement to decide on next. We can then first try requirement clauses for packages that would result in a wheel to be picked instead of one that would pick an sdist.

I have a hunch that this problem might be caused by a bug in resolvo. I think this line should be a continue instead of a return. Not totally sure, though.

That is indeed a bug! It means not all clauses for a solvable are added!

Might be a bug, but had some discussion with @aochagavia offline, and changing it does not change the backtracking behavior. Makes sense because it only gets in the Unknown clause when its already at the apache-beam-2.42.0.zip version, then the extensive backtracking has already occured.

But its hard not to use them :)

Well it's not that you avoid using them altogether. It's just that often when backtracking to older versions of a project and the next older version requires building an sdist instead of wheel it's likely that means the Python version you're on is too new for that old of a package on that project.

It hence makes sense to try backtracking on different projects, at least until you've exhausted all other equally likely backtracking opportunities.

At least that's my hypothesis.

I think that this idea.

I think this can be done if we allow DependencyProviders control over which requirement to decide on next. We can then first try requirement clauses for packages that would result in a wheel to be picked instead of one that would pick an sdist.

With this implementation still seems the way to go, I wonder what ripple effect such a change but I think we can only know by trying.

I wonder what ripple effect such a change but I think we can only know by trying.

Yeah, this is a very opinionated optimization I'm suggesting, and the specific details on the implementation are likely to make a big difference on resolving certain requirements.

I do think though the PyPI packing ecosystem does require opinionated optimizations to get "good" resolution behavior.