pip fails to install multiple packages in single invocation if one depends on another

Related to #988, without dictating a solution.

If pkg-B depends on pkg-A, then:
pip install pkg-A pkg-B fails.

Same goes for installing from a properly ordered requirements.txt.

I've been working around this over and over again for a couple of
years now.

Specifically, In deployment scripts I've ended up iterate over requirements.txt files
line by line, running pip a dozen times or cluttering up the deployment scripts with multiple-chunk requirements files. The horror.

this is not a true general statement. can you give a specific example of 2 pypi packages?

as for your iterating deployment scripts, I'm skeptical you need to be doing that. can you explain?

Sorry, I'll clarify my description. The problem occurs when the install/build of pkg-B depends
on pkg-A being installed.
As a comment in #988 mentions, pip tries to build all packages before they are installed,

I see this all the time with packages that depend on numpy and/or cython, both of
which are meat-and-potatotes deps for pydata packages.

In a fresh venv, this fails:

pip install numpy bottleneck

While this succeeds:

pip install numpy 
pip install bottleneck

because bottleneck's requires numpy to be installed.

For a similar but distinct failure case (starting with a fresh venv), do:

pip install numpy cython

and then this fails

pip install numexpr tables

but this succeeds:

pip install numexpr 
# Note: the build requires libhdf5-devel to be installed
pip install tables

In this failed case, the build process (rather then fails because numexpr
isn't installed at tables build-time.

Finally, any package foo that depends on cython for building itself will fail if you do:

pip install cython foo

So that's 3 distinct examples of how pip's behavior fails installations unnecessarily,
and the last case covers at least a few dozen packages out there.

That should make it clear why the steps I described were necessary.
Unluckily for me, I work with these packages all the time.

I understand why pip would want to enforce all-or-nothing semantics on multiple package installation,
but hopefully these examples convince you that the way it's done breaks for some common, real-world

AFAIK, pip doesn't offer a flag to turn this off. Is there?

Just looking at bottleneck, it doesn't state that it install_requires numpy, just that it requires it. And yet it imports numpy in its So isn't that the issue, that bottleneck is not specifying its install-time dependencies properly?

Yes and no. There are valid reasons people avoid putting stuff in install_requires. Specifically pip -I will
upgrade an install_requires dep if one is available, even though the user may only want to upgrade the
package he/she named.

Casually upgrading a package like numpy is a big no-no because:

  1. It may cause subtle breakage to many other installed packages.
  2. It involves a monster compile.

IIRC, pip can be told not to do that but you have to spelunk the docs quite a bit to find out how
and most users just can't be expected to go through all that, So devs have opted for the lesser of
two evils and omit it from install_requires. I can dig up a long issue elsewhere on this if it matters enough.

Also, The user has implicitly given pip a dependency graph for build-time on the command line,
and my take on it is that the (missing) one-by-one behavior is the mental model most users have
by default.

pip could achieve all-or-nothing installs via rollback, or by doing a moist-run into a venv before
touching the system Or, and this would be just fine, give me a flag and I'll just tell pip I'm willing
to risk it.

The problem isn't the lack of install_requires. Pip attempts to discover all of the dependencies before it installs anything. It has to do this because it needs to know what to install. For instance you gave the example

pip install numpy bottleneck

Ok, so pip first goes out and discovers numpy, it finds it has versions A, B, and C. C is the latest so it selects that for install, then pip goes out and discovers bottleneck, however bottleneck (hypothetically) depends on numpy<C. Now pip goes and deselects C and instead selects B. (Note this isn't exactly how it works at the moment, but it's how it will in the future).

The only way to make it work how you're proposing is to have it, instead of "select" C, "install" C. Then when it discovers that no it needs B not C, have it uninstall C, and then install B. This will be horrifically slow, especially for something like numpy. This isn't something I think that is appropiate when the actual underlying issue is that the way someone uses numpy.distutils is fundamentally broken for automatic dependency resolution.

First, was I at least persuasive enough to merit a reopen? :)

I'm not dictating a solution, you know pip and I only use it. But a --damn-the-torpedos that
might fail (no rollback and other complexities) would be a completely fine solution.

re your points:

  1. broken or not, numpy.distutils only applies to one of the three examples I gave. Even
    if you're right, the other two are still valid.
  2. The "bruteforce search" example you provide has the following case analysis:
  • if the user requests a version that later needs to be replaced, the install will succeed but it may be slow.
  • if no conflict is found the install succeeds and quickly. (the common case for my, arguably typical, usage)

The case analysis for the current behavior is:

  • The install fails.

For the multiple concrete examples I've seen that doesn't look like an absurd compromise.
I acknowledge you can construct pathological cases that would make this behave terribly,
but that shouldn't be a blocker if you never/rarely see them in the wild and the upside is supporting
a large enough class of new cases.
And it seems possible to detect madness when it arises and just have pip scream and die.

Any reasonable way to address these concrete examples would be great. Since I'm not volunteering
to sling the code, I can hardly buck for a pip rewrite. This is however a common pip gripe I've encountered
in pydata land.

Also, re the slow case, what's the difference between:

pip install numpy==1.6
pip install bottleneck
# replaces numpy by version bottleneck wants


pip install numpy==1.6 bottleneck
# which would install numpy twice

in terms of the duration of the install process?

Sorry about the verbosity, I'll try to be more concise for the rest of the thread...

although pip doesn't have a true resolver (#988), the basic idea of resolving dependencies first, and installing once, is fundamental, and not something I can imagine changing to work around the cases you mention.

the examples you cite all seem to be build-time dependency issues, i.e. pkgB needs pkgA even to build. the pip/setuptools solution right now is setup_requires. But for various reasons (being hard to compile; projects requiring non-python dependencies; pip's recursive upgrade logic; people not knowing setuptools), some projects choose not to use those keywords.

to be clear, conda hasn't solved this quagmire due to having better resolution logic (e.g. by having a SAT solver like you mention in #988), it solves it by being a "full-stack" management tool that manages all of the non-python dependencies required for any of it's packages, and it installs pre-built binaries.

how might it be easier in the future for pip to install some of the projects you mention? Honestly, that's still TBD, and there's a fair amount of discussion on that now on distutils-sig. wheels and PEP426 will certainly be part of the answer.

the reason I'm leaving this closed, is that the core of the issue isn't a problem with pip's install logic, but rather it's a much broader issue of managing external dependencies, which goes beyond pip itself.

It might be worthwhile to leave a pip issue open that points to some of these ongoing discussions, so that new issues can be closed against that.

To answer your later question, there wouldn't be any difference in length of time. However a proper dependency resolver (which pip doesn't have yet) would determine which version of numpy needed installed without installing it twice, which is where the "will take more time" comes from.

To somewhat muddle this issue, there is the --no-deps flag to pip which should disable the dependency resolution all together and requires you to specify all the dependencies either on the command line or in a requirements.txt file. However this won't currently solve the numpy/bottleneck problems and such because as you noted pip doesn't really grok build time dependencies (leaving that up to the build tool itself, generally setuptools which uses setup_requires).

Personally I'd need to think about this some more. This isn't well supported by any tool right now, and I'm not sure if there's a good way of kluding a method in that won't make things worse in the long term while we wait for the real solution. It's possible that with a little bit of tweaking the existing --no-deps flag can be made to work in a way that would support this. My fear there is that I believe isolated builds and a proper build time dependency specification that comes with PEP426/Metadata 2.0 is a better solution and that if we un-isolate the builds now we'll end up regretting it once PEP426 is in place.

Reasonable. I hope these issues are tackled down the road as part of the shake-up in python packaging.
An open issue seems in order since these issues remain unresolved while being something users
expect their favorite package manager to handle.

Now, back to the iterating deployment scripts...

I'm not sure if this is informative or helpful to anyone, but I came across this thread while trying to resolve a build-time dependency – a package which require another package in order to be set up. I tried using setup_requires, and it fails, because I need forked versions of both packages. I specify the git paths. Unfortunately, when building the dependent package, it seems to encounter the setup_requires, but resolves it using the broken one from pypi instead of my fork.

I can't seem to find a way around it, so I'm going to just script installing the dependency first.

Here is my utility script, that uses pip-compile + pip-sync + virtualenv

Xargs can be used to install requirements.txt on a line by line basis if required

pip install --upgrade pip pip-tools
timeout 5 pip-compile || pip-compile -v  # --generate-hashes
pip install -r ./requirements.txt || cat ./requirements.txt | perl -p -e 's/\s*#.*$//g' | sed '/^\s*$/d' | xargs -d'\n' -L1 -t pip install