pypa / distutils

distutils as found in cpython

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

get_python_inc returns wrong include directory for bundled Pythons

zjp opened this issue · comments

My group maintains a project which bundles Python and so we compile packages with our bundled Python and not the system Python on our machines to ensure a stable environment for users. After #173, sysconfig.get_python_inc() always returns the Python headers in /Library/Frameworks instead of the header directory relative to our bundled Python.

What is the value of spec_prefix?

Do you have any samples you can share?

It is 'None' both before and after.

@rnhmjoj Would you be able to analyze or help zjp analyze what's going on to characterize the missed expectation?

Sorry, but I'm just a very stubborn guy that wanted to cross compile python packages, bisected all the way to that commit and fixed an obvious logical inconsistency: I know nothing more about this project or how the python build system works.

@zjp, would you be able to give instructions on how to reproduce this error? Is there a way for me to get a copy of your bundled python that I could run?

@isuruf You can download the daily build of ChimeraX here: https://www.cgl.ucsf.edu/chimerax/download.html

We give users access to a Python shell to script the application as well as a limited interface to interact with pip.

I dragged a copy of the 1.5 daily build to my desktop on an M1 Mac laptop (this just cuts down the time to open the app dramatically).

I then opened an ipython shell by going to the top menu bar: Tools -> General -> Shell.

>>> from distutils import sysconfig
>>> sysconfig.get_python_inc()
'/Users/zjp/Desktop/ChimeraX_Daily 2.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/include/python3.9'

Then in ChimeraX's own command line I typed pip install setuptools upgrade true to get a copy of setuptools 65.4.0, which has merged a new version of distutils which includes the PR from the original post and restarted ChimeraX to repeat the above operations.

Afterwards:

>>> from distutils import sysconfig
>>> sysconfig.get_python_inc()
'/Library/Frameworks/Python.framework/Versions/3.9/include/python3.9'

I made some detailed analysis. And provide a solution.
Here #179 @jaraco @rnhmjoj @isuruf @FFY00 @zjp
Maybe you can consider this way.

I just found that the same 'Fix, again, finding headers during cross compiling' broke Yocto builds - which is a fairly important cross compile case :-)

You can see the newly occuring fails here:
https://autobuilder.yoctoproject.org/typhoon/#/builders/20/builds/6466/steps/11/logs/stdio
(particularly when building python3-cffi)

I'm investigating.

Seems like there are bogus paths in our configs that this change exposes. So the problem seems to be on our side, you don't need to do anything.

Maybe you can consider this way.

In that issue, wqh17101 reports that the issue can be attributed to precedence defined at:

return (
_get_python_inc_posix_python(plat_specific)
or _get_python_inc_from_config(plat_specific, spec_prefix)
or _get_python_inc_posix_prefix(prefix)
)

In particular, they propose giving precedence to _get_python_inc_posix_prefix.

This proposed approach is untenable, because _get_python_inc_posix_prefix always returns a value, so _get_python_inc_from_config would never be used.

I believe kanavin's finding sheds some light on the issue. I suspect that users encountering this issue may have a local configuration that's triggering this code path.

I suggest to look to find the source of CONFINCLUDEPY AND INCLUDEPY in sysconfig's config vars.

It seems to me that the issue encountered by wqh17101 may be different than that encountered by zjp, because zjp is reporting that the custom include path is being overridden by a system one.

Anyone still experiencing the issue, can you report the output of:

python -c "import distutils.sysconfig as sc; print(sc.get_config_vars('CONFINCLUDEPY', 'INCLUDEPY')); print('python_build:', sc.python_build); print('posix_python:', sc._get_python_inc_posix_python(False)); print('from_config:', sc._get_python_inc_from_config(False, None)); print('posix_prefix:', sc._get_python_inc_posix_prefix(sc.BASE_PREFIX))"

(or execute the Python in a Python interpreter).

Output from the distutils bundled with setuptools both before and after the commit referenced in this ticket made it into setuptools' vendored distutils.

ChimeraX git:(develop) Ⲗ ./ChimeraX.app/Contents/bin/python3.9

Python 3.9.11 (v3.9.11:2de452f8bf, Mar 16 2022, 10:44:40)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import distutils.sysconfig as sc; print(sc.get_config_vars('CONFINCLUDEPY', 'INCLUDEPY')); print('python_build:', sc.python_build); print('posix_python:', sc._get_python_inc_posix_python(False)); print('from_config:', sc._get_python_inc_from_config(False, None)); print('posix_prefix:', sc._get_python_inc_posix_prefix(sc.BASE_PREFIX))
['/Library/Frameworks/Python.framework/Versions/3.9/include/python3.9', '/Library/Frameworks/Python.framework/Versions/3.9/include/python3.9']
python_build: False
posix_python: None
from_config: None
posix_prefix: /Users/zjp/git/rbvi/ChimeraX/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/include/python3.9
>>> ^D

ChimeraX git:(develop) Ⲗ ./ChimeraX.app/Contents/bin/python3.9 -I -m pip install --upgrade setuptools
Requirement already satisfied: setuptools in ./ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages (65.1.1)
Collecting setuptools
  Using cached setuptools-65.5.0-py3-none-any.whl (1.2 MB)
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 65.1.1
    Uninstalling setuptools-65.1.1:
      Successfully uninstalled setuptools-65.1.1
Successfully installed setuptools-65.5.0

[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: /Users/zjp/git/rbvi/ChimeraX/ChimeraX.app/Contents/bin/python3.9 -m pip install --upgrade pip
 ChimeraX git:(develop) Ⲗ ./ChimeraX.app/Contents/bin/python3.9
Python 3.9.11 (v3.9.11:2de452f8bf, Mar 16 2022, 10:44:40)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import distutils.sysconfig as sc; print(sc.get_config_vars('CONFINCLUDEPY', 'INCLUDEPY')); print('python_build:', sc.python_build); print('posix_python:', sc._get_python_inc_posix_python(False)); print('from_config:', sc._get_python_inc_from_config(False, None)); print('posix_prefix:', sc._get_python_inc_posix_prefix(sc.BASE_PREFIX))
['/Library/Frameworks/Python.framework/Versions/3.9/include/python3.9', '/Library/Frameworks/Python.framework/Versions/3.9/include/python3.9']
python_build: False
posix_python: None
from_config: /Library/Frameworks/Python.framework/Versions/3.9/include/python3.9
posix_prefix: /Users/zjp/git/rbvi/ChimeraX/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/include/python3.9
[root@aaaaa xxxxxx]# python3 -c "import distutils.sysconfig as sc; print(sc.get_config_vars('CONFINCLUDEPY', 'INCLUDEPY')); print('python_build:', sc.python_build); print('posix_python:', sc._get_python_inc_posix_python(False)); print('from_config:', sc._get_python_inc_from_config(False, None)); print('posix_prefix:', sc._get_python_inc_posix_prefix(sc.BASE_PREFIX))" 
['/devcloud/ws/sirFh/workspace/j_VWBQ8FGA/aaa_python/third_build/python/include/python3.9', '/devcloud/ws/sirFh/workspace/j_VWBQ8FGA/aaa_python/third_build/python/include/python3.9']
python_build: False
posix_python: None
from_config: /devcloud/ws/sirFh/workspace/j_VWBQ8FGA/aaa_python/third_build/python/include/python3.9
posix_prefix: /opt/aaa/python/python-3.9.2/include/python3.9

Thanks both for that information. Good news is python_build isn't implicated at all.

I can see from that information that [CONF]INCLUDEPY has always been set but wasn't being honored until the intentional change to honor that value.

You can find where that value comes from by doing the following:

>>> import sysconfig
>>> import importlib
>>> data = importlib.import_module(sysconfig._get_sysconfigdata_name())
>>> data.__file__
'/opt/homebrew/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/_sysconfigdata__darwin_darwin.py'

That file is where INCLUDEPY should be defined. That value was generated when that copy of Python is built. Perhaps the issue is that when cross-compiling, the target Python version isn't taking precedence. Maybe what needs to happen is the cross-compiled Python needs to supply the _get_sysconfigdata_name.

Can someone try replicating the issue in a Dockerfile?

For what it's worth, we (yocto) have the same cross compile problem (whether to use build host vs target specific sysconfigdata), and solve it by patching python to look at the host version of it, unless there is a magic unix environment variable. It's not super elegant, but I do not think upstream python currently offers a better option.

For what it's worth, we (yocto) have the same cross compile problem (whether to use build host vs target specific sysconfigdata), and solve it by patching python to look at the host version of it, unless there is a magic unix environment variable. It's not super elegant, but I do not think upstream python currently offers a better option.

Actually, even patching is not necessary; I just looked into it properly, and pointing python to a completely non-default sysconfigdata can be done thusly prior to starting python ($STAGING_LIBDIR is a yocto specific variable, adjust as needed):

        export _PYTHON_SYSCONFIGDATA_NAME="_sysconfigdata"
        export PYTHONPATH=${STAGING_LIBDIR}/python-sysconfigdata

This change in behaviour caused breakage for us too (when upgrading from setuptools 65.1.1 to 65.2.0, which includes #173).

In our case, the Python install has been relocated from its original install --prefix to another location, since the archive in which it is shipped has to be unpacked in an arbitrary directory on another machine. (Example archive, built for Ubuntu 22.04: https://heroku-buildpack-python.s3.us-east-1.amazonaws.com/heroku-22/runtimes/python-3.11.0.tar.gz)

Python itself handles this fine, since it dynamically adjusts sys.prefix, sys.exec_prefix and sys.base_prefix to point at the new location (as partly documented here).

As such, distutil's _get_python_inc_posix_prefix() will actually return the correct value, since it's passed either sys.base_prefix or sys.base_exec_prefix from get_python_inc() (depending on whether plat_specific is set or not).

The issue is that after #173, the order of precedence here favours the path returned by _get_python_inc_from_config() (instead of falling back to _get_python_inc_posix_prefix()):

_get_python_inc_posix_python(plat_specific)
or _get_python_inc_from_config(plat_specific, spec_prefix)
or _get_python_inc_posix_prefix(prefix)

...even though the path returned by _get_python_inc_from_config() is bogus, since it refers to the original Python install location, which no longer exists.

It seems the options to fix this might be:

  1. Adjust the order of precedence in _get_python_inc_posix(), such that _get_python_inc_posix_prefix() is checked before _get_python_inc_from_config().
  2. Or, have _get_python_inc_posix() return the location of all the header paths (not just one), and pass all paths to the compiler (that way if some paths don't exist, the compiler will just fall back to the next in the -I <path_1> -I <path_2 sequence).
  3. Or, have _get_python_inc_posix() still return only one path, but have it first check that the calculated path actually exists before using it, and if it doesn't, move onto the next path in the precedence list.

Or how about not having a bogus value in the config? You can fix the one you have, or you can supply a correct custom one as shown above.

So the "bogus value" in the config, is presumably coming from the --prefix set during initial configure - we're not adding an explicit broken config/env var after the fact. We can't use the actual path as --prefix during the build, since we don't know in advance where the Python install will be run from (it's end-user controlled).

As a workaround, we can try and rewrite the config (or say set CPATH manually), however:

  1. Python is otherwise relocatable (it returns the correct new install locations for sys.prefix, sys.exec_prefix and sys.base_prefix - distutils is just not using them), so it would be ideal if it could continue to be.
  2. This is a breaking change in distutils behaviour (and thus setuptools), which at first glance seems unintended, so at the least perhaps needs discussion/release notes/..., and ideally returning to the old behaviour if it's not intended.

I'm not so sure that doing automatic guessing is the correct approach. I like being able to specify things explicitly in config files and have those specifications take precedence, because we can and do need to change those things when cross-compiling python and various python-based items for example, and in that case guessing by native python that is running on the build host only gets in the way.

As for your situation, how about simply deleting the config file that has bogus stuff in it before you package up python? If something breaks then, then it should be fixed up to fall back to reasonable built-in (or guessed from executable binary location) defaults.

I will explore that option too, thank you.

Other alternatives I am considering:

  • Setting CPATH to the new include/pythonX.Y location explicitly
  • Rewriting the hardcoded prefix in eg lib/pkgconfig/python-X.Y.pc so pkg-config works (I don't know how many things check pkg-config for paths too)

Checking so far, it seems bin/python-config already rewrites the include paths, so handles relocated installs fine (for things that actually use python-config):
https://github.com/python/cpython/blob/d460c8ec52716a37080d31fdc0f673edcc98bee8/Misc/python-config.sh.in#L15-L36

See also discussion on the breaking change in setuptools:
pypa/setuptools#3657

I would lean toward changing the order of precedence to prefer _get_python_inc_posix_prefix(), but it does appear to work to just replace the values in INCLUDEIR and INCLUDEPY with ''. You can't just remove sysconfigdata altogether because it will break setuptools.

This also impacts the version of Python included in Xcode if installed under a different name than Xcode.app.

I would lean toward changing the order of precedence to prefer _get_python_inc_posix_prefix(), but it does appear to work to just replace the values in INCLUDEIR and INCLUDEPY with ''.

Several people have suggested to change the order of prececedence, but if that's done, it effectively disables the includes from config (as _get_python_inc_posix_prefix always returns something so _get_python_inc_from_config never gets called).

You can't just remove sysconfigdata altogether because it will break setuptools.

What about removing the values of sysconfigdata that are leading to the incorrect config?


  • Or, have _get_python_inc_posix() return the location of all the header paths (not just one), and pass all paths to the compiler (that way if some paths don't exist, the compiler will just fall back to the next in the -I <path_1> -I <path_2 sequence).

  • Or, have _get_python_inc_posix() still return only one path, but have it first check that the calculated path actually exists before using it, and if it doesn't, move onto the next path in the precedence list.

These proposals may be worth exploring. I'm still not convinced that it should be distutils' responsibility to accommodate misconfigured environments. I'd prefer instead for the environments to be properly configured so that distutils can simply honor the configuration. Before we explore these options, let's show definitively that misconfigured environments are to be expected (i.e. that there isn't a reliably way for a portable Python environment to be configured).

Also probably related: pypa/setuptools#3786

This could be considered a python bug in sysconfig._init_posix(). PyPy overrides this by using a _sysconfigdata.py with dynamic values from sys. Over the years PyPy has assembled the minimum required set of variables needed to compile/cross-compile in that file. There is also a mechanism to create the _sysconfigdata_* file in sysconfig.py

I'm finally able to replicate this issue after discovering that python-build-standalone is affected:

$ git clone gh://indygreg/python-build-standalone
$ cd python-build-standalone
$ git checkout 20230116
$ py build-macos.py
...
$ cd dist
$ pip-run jaraco.zstd -- -m jaraco.zstd -e *.zst
$ cd python
$ install/bin/python3 -c "import distutils.sysconfig as sc; print(sc.get_python_inc())"
/install/include/python3.10
$ install/bin/python3 -m pip install --no-binary hello-c-extension hello-c-extension
...
      src/cmod/_cmodule.cc:2:10: fatal error: 'Python.h' file not found
      #include <Python.h>
               ^~~~~~~~~~

In #200, I've added a test and implemented the proposed workaround to only return the configured dir if it exists.

Thanks for looking into this. I have one follow up question: now that a fix is in place, should the order of precedence be changed to prefer the nearest Python?

I patched my copy of distutils to include your change, then built a wheel with ChimeraX (which uses a portable Python). I also have a system Python on my mac at /Library/Frameworks.

We don't want users to use subtly different versions of Python to build wheels that may have binary extensions, just ours. After applying the fix, I still see that Python.h in /Library is preferred over Python.h inside the ChimeraX distribution (though, after mv Python.framework Python.framework.backup the correct location was used).

I think it makes sense to try and get the header associated with the Python that's been invoked instead of looking for system headers first.

I opened a new issue.