tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

Home Page:https://tensorflow.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How should Python packages depending on TensorFlow structure their requirements?

dustinvtran opened this issue · comments

Many packages build on TensorFlow. For example, our work in Edward uses tensorflow>=1.0.0a0 as an install requirement.

However, this conflicts with tensorflow-gpu, which can no longer be installed because of the requirement specifically on tensorflow. What do you suggest is the best way to handle this?

One option suggested by @gokceneraslan (blei-lab/edward#428 (comment)) is to hack in the dependency according to whether the user has a GPU. Another option, which PrettyTensor and Keras employ, is to not even require TensorFlow. (Both options sound not good.)

Also see blei-lab/edward#428. also looping in GPflow devs (@jameshensman, @alexggmatthews) in case they have the same problem. (Note I'm raising this as an issue instead of asking on a mailing list, in case this is something that should be changed on TensorFlow's end and not our end.)

I'm pretty sure the fundamental issue here is that pypi doesn't support uploading wheels with and without GPU support (see PEP 425 for a list of supported tags). Hence the separate "tensorflow-gpu" distribution on pypi.

Both of your suggested work-arounds (hacks in setup.py or simply removing problematic dependencies altogether) are commonly done by Python packages for scientific computing.

Do you have some examples? This would be great references in deciding from the work-arounds. (I'm leaning towards removing the dependence.)

@dustinvtran Here's a discussion about this for patsy: pydata/patsy#5

@shoyer interesting example, I wonder if that explains why pip install --upgrade $TF_BINARY_URL replaces MKL numpy on our machines with OpenBLAS numpy (there's REQUIRED_PACKAGES = [ 'numpy >= 1.11.0', inside of TF's setup.py

@yaroslavvb yes, that's likely the case. Pip install only recently got the option --upgrade-strategy=only-if-needed which is probably what you want to use here. Eventually this will become the default behavior (pypa/pip#3871).

@dustinvtran See here for an example which detects a CUDA installation and then selects either tensorflow/tensorflow-gpu depending on CUDA availability.

@sjperkins Detecting if nviidia gpu is available will not work when installing in Dockerfile to a Docker image, and in general if you install it somewhere where you don't intend to run it.

I was checking if it could be done with setup.py extras:
https://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies

but I don't think it's possible.

IMO most straightforward would be to drop tensorflow from the install_requires list, and just try to import tf it in setup.py, catch the ImportError, and raise some pip exception saying that you need either tensorflow or tensorflow-gpu installed.

Detecting if nviidia gpu is available will not work when installing in Dockerfile to a Docker image, and in general if you install it somewhere where you don't intend to run it.

@t0mk One still needs to install CUDA in the docker container. The example I provided will still compile CUDA code if GPUs aren't available, but it won't be able to target specific architectures and will default to sm_30.

Currently I am using extras_require in setup.py

setup(
    name="my_package",
    ...,  # other stuff
    install_requires=<list of dependencies EXCLUDING tensorflow>,
    extras_require={
        "tf": ["tensorflow>=1.0.0"],
        "tf_gpu": ["tensorflow-gpu>=1.0.0"],
    }
)

The problem here is that if user does not specify which version of package he/she wants, i.e. like this my_package[tf_gpu], tensorflow won't be required. But I think at least it is better then not specifying tensorflow at all.

For Edward I decided to remove the explicit dependence on TensorFlow and make it part of extras_require. Not ideal, but I think it's the best present solution. Feel free to close this issue—it would be nice though to have this type of recommended advice in the docs.

In case this helps others, we use this Poetry dependency specification to target different TF runtimes based on operating system and architecture. Note that these requirements are not used for training, only for making inferences using a pre-trained model.

[tool.poetry.dependencies]
# Use Tensorflow Lite on Linux to produce slim production Docker images.
tflite-runtime = { version = "*", markers = "sys_platform == 'linux'" }

# Tensorflow Lite only ships wheels for Linux, so use full tensorflow on other platforms.
tensorflow = { version = "*", markers = "sys_platform != 'linux' and platform_machine != 'arm64'" }

# ARM Mac wheels are hosted under a different PyPI package name.
tensorflow-macos = { version = "*", markers = "sys_platform == 'darwin' and platform_machine == 'arm64'" }