When $TMPDIR is within $PWD, creating wheel dies on infinite recursion
hroncok opened this issue · comments
Describe the bug
Due to the logic in
yarl/packaging/pep517_backend/_backend.py
Lines 197 to 204 in bd5ff24
$TMPDIR
to itself when it is inside $PWD
.
In Fedora, when we build wheels during building RPMs, we set $TMPDIR
to be in $PWD
-- it is a requirement to let RPM know about all the sources that have been built (e.g. to successfully create debugsource packages).
To Reproduce
$ git clone git@github.com:aio-libs/yarl.git
$ cd yarl
[yarl (master)]$ mkdir .tmp-in-tree
[yarl (master %)]$ export TMPDIR=$PWD/.tmp-in-tree
[yarl (master %)]$ pip wheel . --no-deps
Expected behavior
The folder needs to be excluded from copying. See my fix for the same problem in pip: pypa/pip#7873
Logs/tracebacks
$ pip wheel . --no-deps
Processing .../yarl
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: yarl
Building wheel for yarl (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for yarl (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [33 lines of output]
**********************
* Accelerated build *
**********************
Traceback (most recent call last):
File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File ".../yarl/packaging/pep517_backend/_backend.py", line 287, in build_wheel
with maybe_prebuild_c_extensions(
File "/usr/lib64/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File ".../yarl/packaging/pep517_backend/_backend.py", line 261, in maybe_prebuild_c_extensions
with build_dir_ctx:
File "/usr/lib64/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File ".../yarl/packaging/pep517_backend/_backend.py", line 202, in _in_temporary_directory
copytree(src_dir, tmp_src_dir, symlinks=True)
File "/usr/lib64/python3.12/shutil.py", line 588, in copytree
return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/shutil.py", line 542, in _copytree
raise Error(errors)
shutil.Error: [('.../yarl/.tmp-in-tree/.tmp-yarl-pep517-pspb0f32/src/.tmp-in-tree/.tmp-yarl-pep517-pspb0f32/src/.tmp-in-tree/.tmp-yarl-pep517-pspb0f32/src/.tmp-in-tree/....[Errno 36] File name too long: ...tmp-yarl-pep517-pspb0f32'")]
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for yarl
Failed to build yarl
ERROR: Failed to build one or more wheels
Python Version
$ python --version
Python 3.12.1
But happens with any.
multidict Version
$ python -m pip show multidict
N/A (happens at build time)
yarl Version
$ python -m pip show yarl
N/A (happens at build time)
OS
Fedora Linux
Additional context
This was reported to us as a problem in one of our tools: befeleme/pyp2spec#44
The same issue existed in pip 4 years ago: pypa/pip#7872
Code of Conduct
- I agree to follow the aio-libs Code of Conduct
Brilliant! Thanks for the details! I saw this problem in ansible/pylibssh (where I have RPM smoke tests and Packit integrated) but haven't had time to figure it our myself. I'll try to patch this across the @aio-libs projects that share the same code and there. Thanks for solving this mystery!
I am writing the ignore function and will test it.
@hroncok what do you think the correct behavior should be? I was thinking maybe erroring out early asking the caller to use a different tmp directory? I was also thinking (earlier, before I discovered the bug) of having a config setting for explicitly requesting in and out-of-tree builds passed to the backend that would give the caller a way of overriding the defaults.
This implements the behavior as I think is the correct behavior:
@contextmanager
def _in_temporary_directory(src_dir: Path) -> t.Iterator[None]:
with TemporaryDirectory(prefix='.tmp-yarl-pep517-') as tmp_dir:
with chdir_cm(tmp_dir):
tmp_src_dir = Path(tmp_dir) / 'src'
def _ignore(d, _):
"""
Prevent temporary directory to be copied into self
recursively forever.
Fixes https://github.com/aio-libs/yarl/issues/992
"""
if Path(d) == tmp_src_dir.parent:
return [tmp_src_dir.name]
return []
copytree(src_dir, tmp_src_dir, symlinks=True, ignore=_ignore)
os.chdir(tmp_src_dir)
yield
@hroncok I suppose this could work. Though I'm not sure how safe it's to put tmp dir inside a project dir in general. It seems like similar problems are bound to happen with other tools. I like how Gentoo organizes the build dir layout — they have a top-level package tmp dir and inside, there's separate work, temp and build. So they don't intersect, being isolated from each other, while still being scoped on disk.
With that, I don't know if it's reasonable to allow such a silent behavior. Wouldn't it be safer to have the downstreams opt-in to it with an error by default? More globally, I can imagine various build backends picking up garbage files from tmp just because it's laying around in the workdir. Should this be solved in a systemic way in the distro macros? Can it be changed to something like TMPDIR="%{_buildrootdir}/tmp"
?
@mgorny to get another downstream perspective, could you share any insights why Gentoo doesn't put tmp in the source checkout dir?
This implements the behavior as I think is the correct behavior:
@contextmanager def _in_temporary_directory(src_dir: Path) -> t.Iterator[None]: with TemporaryDirectory(prefix='.tmp-yarl-pep517-') as tmp_dir: with chdir_cm(tmp_dir): tmp_src_dir = Path(tmp_dir) / 'src' def _ignore(d, _): """ Prevent temporary directory to be copied into self recursively forever. Fixes https://github.com/aio-libs/yarl/issues/992 """ if Path(d) == tmp_src_dir.parent: return [tmp_src_dir.name] return [] copytree(src_dir, tmp_src_dir, symlinks=True, ignore=_ignore) os.chdir(tmp_src_dir) yield
Confirming that this does seem to fix the build issue I was having in befeleme/pyp2spec#44.
I suppose this could work.
It works for pip ;)
Though I'm not sure how safe it's to put tmp dir inside a project dir in general.
Generally, it is quite safe. It took 4 years to find a project that was negatively impacted.
In my head, taking the whole pwd and starting copying it somewhere is unsafe by design. What if the working tree is dirty and contains very large files?
It seems like similar problems are bound to happen with other tools.
You mean other tools that put tmpdir in pwd will also be impacted by the current implementation of _in_temporary_directory? Yes.
Or do you mean other build backends will have the same problem when we put tmpdir in pwd? Because so far there hasn't been much.
I like how Gentoo organizes the build dir layout — they have a top-level package tmp dir and inside, there's separate work, temp and build. So they don't intersect, being isolated from each other, while still being scoped on disk.
That looks nice, but I am afraid it would be too hard for us to do it that way now. Traditionally, rpms are built directly in unpacked source tree -- we would need to adjust all the Python packages to create subdirectories in rpm's %prep
.
I suppose this could work.
It works for pip ;)
Though I'm not sure how safe it's to put tmp dir inside a project dir in general.
Generally, it is quite safe. It took 4 years to find a project that was negatively impacted.
Yeah, but it still seems like a can of worms. Especially as the resulting dists end up in the same tree by default.
In my head, taking the whole pwd and starting copying it somewhere is unsafe by design. What if the working tree is dirty and contains very large files?
Yes, I was considering using git worktree
to get the clean source out of the Git tree. Though, while implementing this, I ended up forgetting about that idea and simply used a cp -r
.
My idea was to essentially prevent accidental local modifications from influencing the dists that are going to be uploaded for public consumption. And default to in-tree builds for editable installs since this is what's needed during development.
I recognize that this approach may not be ideal for those applying patches while repackaging, but I wanted to minimize the probability of auto-including junk.
It seems like similar problems are bound to happen with other tools.
You mean other tools that put tmpdir in pwd will also be impacted by the current implementation of _in_temporary_directory? Yes.
Or do you mean other build backends will have the same problem when we put tmpdir in pwd? Because so far there hasn't been much.
Both, I suppose. PEP 517 does not demand building in a certain directory while giving the backends the freedom to do whatever for as long as the communication interface and frontend/backend boundary is respected. PEP 660 kinda implies an in-tree build, though, due to the nature of editable installs.
I guess the point is that the layout that pyproject-rpm-macros
works most of the time is an accident because it relies on something that is not guaranteed by the standards currently in place. And so, I argue that there should be a mechanism to change that.
I like how Gentoo organizes the build dir layout — they have a top-level package tmp dir and inside, there's separate work, temp and build. So they don't intersect, being isolated from each other, while still being scoped on disk.
That looks nice, but I am afraid it would be too hard for us to do it that way now. Traditionally, rpms are built directly in unpacked source tree -- we would need to adjust all the Python packages to create subdirectories in rpm's
%prep
.
Well, I had to point it out anyway.. I suppose the current way out is to implement that config setting configuration I was thinking about to explicitly request building in-tree. I imagine this might be useful to other downstreams too.
I'll probably integrate your patch with some conditionals and the PEP 517 config setting on top to address this. And then, you'll need to adjust the RPM spec to pass the in-tree setting (I know that it's doable).
I got to looking into this and one observation is that using python -m build
performs copying and chdir
first, while python -m pip wheel
doesn't. That's why this wasn't reproducible easily.
Thanks.