gitpython-developers / GitPython

GitPython is a python library used to interact with Git repositories.

Home Page:http://gitpython.readthedocs.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Submodule.__init__ `parent_commit` conversion/validation is implied but not done

EliahKagan opened this issue · comments

The parent_commit parameter of Submodule.__init__ is documented this way:

:param parent_commit:
See :meth:`set_parent_commit`.

Submodule.__init__ binds this directly to the private _parent_commit attribute:

self._parent_commit = parent_commit

But this is at odds with the documented relationship to Submodule.set_parent_commit. That method's commit parameter corresponds to the parent_commit parameter of Submodule.__init__, in that its commit parameter is used to identify the commit, if any, to set as _parent_commit. However, set_parent_commit performs both conversion and validation.

This is the relevant fragment of set_parent_commit's docstring:

:param commit:
Commit-ish reference pointing at the root_tree, or ``None`` to always point
to the most recent commit.

When commit is None, it sets None to _parent_commit. Otherwise, however, commit may not already be a Commit object, and that is okay, because a commit is looked up from it:

pcommit = self.repo.commit(commit)

That's the conversion. Then validation is performed, with _parent_commit ending up set to the commit that commit identified only if there is such a suitable commit:

pctree = pcommit.tree
if self.k_modules_file not in pctree:
raise ValueError("Tree of commit %s did not contain the %s file" % (commit, self.k_modules_file))
# END handle exceptions
prev_pc = self._parent_commit
self._parent_commit = pcommit
if check:
parser = self._config_parser(self.repo, self._parent_commit, read_only=True)
if not parser.has_section(sm_section(self.name)):
self._parent_commit = prev_pc
raise ValueError("Submodule at path %r did not exist in parent commit %s" % (self.path, commit))
# END handle submodule did not exist
# END handle checking mode

The type annotations do not reveal the intent, as they are among those using Commit_ish that need to be updated with the fix for Commit_ish, and that I am fixing up in #1859. My immediate motivation for opening this issue is that I'm having trouble figuring out how to annotate them, because due to the inconsistency between the docstring and the implementations, I don't know what is intended to be accepted.

Either the documentation should be updated, which could be part of #1859, or the code should be fixed to perform any expected validation and conversion and a test case added to check that this is working, which would be best done separately from #1859 (lest its scope expand ever further). I am not sure which. For #1859, it is likely sufficient for me to know what is intended, so full progress on this is not needed to finish #1859. It is my hope and also strong guess that this issue is not a blocker for #1859.

This should not be confused with #1866, which is about the parent_commits parameter of IndexFile.commit rather than the parent_commit parameter of Submodule.__init__ (and which, unlike this, really is about annotations).

Thanks a lot for documenting the issue.

As this implementation was intended to be 'a more easily usable' version of git submodules (back then they were much rougher than they are now), I'd think it's better to let the code match the documented behaviour, assuming that wouldn't unnecessarily narrow its applicability but instead makes it safer. However, it's hard for me to be more assertive here.

I worry about the possibility that binding something other than a GitPython Commit (or None) object to the private _parent_commit attribute might actually be relied on. Because much interaction with repositories is through the git executable where GitPython marshals Python objects into command-line arguments by using their str, often more types than intended are possible to use successfully.

To be clear, I'm not suggesting GitPython forever continue to support all possible wrong uses that happen somehow to be working, only that maybe this hasn't been found before because the asymmetry is doing something valuable that should be retained. I'll look into this a bit further by seeing how _parent_commit is used, and how the methods that access it are themselves used.

I'll report back, at least to say whether I was able to find enough information to proceed with the related part of #1859.

I've looked into this a bit further. First, I should mention that some of the validation I showed above is not done by default in Submodule.set_parent_commit:

if self.k_modules_file not in pctree:
raise ValueError("Tree of commit %s did not contain the %s file" % (commit, self.k_modules_file))

But then also, related to that, there is this near the end, which I had not shown:

# Update our sha, it could have changed.
# If check is False, we might see a parent-commit that doesn't even contain the
# submodule anymore. in that case, mark our sha as being NULL.
try:
self.binsha = pctree[str(self.path)].binsha
except KeyError:
self.binsha = self.NULL_BIN_SHA

The potentially expensive validation is not done by default--check defaults to False--so a plain reading of the Submodule.__init__ docstring saying to see set_parent_commit for the parent_commit parameter would be that this extra validation should not be done in __init__ either and that, moreover, initializing a Submodule object should set its parent commit the same way as calling set_parent_commit.

But it seems to me that it would not be intuitive that constructing a Submodule object with an explicit binsha and explicit parent_commit could result in no error, but the constructed Submodule object immediately having an all-zeros binsha instead of the binsha passed. For this reason, I am reluctant to forge ahead in having Submodule.__init__ call Submodule.set_parent_commit (or equivalent) without further discussion.

Besides that, however, it does seem like the private _parent_commit attribute that these methods set is always expected, when not None, to be a Commit object and not, for example, a string. In particular, Submodule methods like config_reader need this. The config_reader method calls _config_parser_constrained, which calls _config_parser, which for unrecognized values (even when correct) ends up calling _sio_modules, which is incompatible with a str because it accesses the tree attribute of the parent commit object:

sio = BytesIO(parent_commit.tree[cls.k_modules_file].data_stream.read())

I get AttributeError if I set a string initially when constructing a Submodule object and then call config_reader.

Thanks for the update!

By the looks of it, fixing the code is a bit tougher than anticipated and it pains me to see you spend time on the submodule implementation, even though I know how it came to be naturally and that it is necessary. After all, the submodule implementation is likely broken in many ways, so it is my hope that any fixes only go as deep as they have to be in order to help with what initially triggered the investigation into submodules in the first place.

With that said, maybe adjusting the documentation would be easier at this point, and not necessarily (much) worse?

I think for now--not as a fix for this bug, but as a way forward that somewhat mitigates it at least from a documentation perspective and also allows #1859 to be finished--that it may be sufficient just to fix the type annotations based on what actually works. This will make clearer how they can be used, and updates to the docstrings or changes to their behavior could come later. I've done this in 1f03e7f.

It looks like this issue may have been inadvertently closed automatically as a result of the wording in 1f03e7f:

Even once that is done, this will not have fixed #1869

The phrase "will not have fixed #1869" contains "fixed #1869", which GitHub took to mean that the commit fixed #1869:

Screenshot of the commit viewed on GitHub in a web browser, with the mouse cursor over the word "fixed", showing the GitHub tooltip saying "This commit closes issue #1869"

I've done a bit more in #1877, which I think is the minimal further change necessary to consider this fixed (though it might be appropriate to leave it open even after that, since greater documentation changes and/or behavioral changes may be warranted).

Thanks for catching this! I merged #1877 and would be happy with keeping this issue closed if you agree. Otherwise it's definitely OK to keep it open as well.

I think it may be reasonable to consider #1877 (together with previous changes) to have fixed this, I'm not sure. I'm not advocating that it be reopened at this time.