gitpython-developers / GitPython

GitPython is a python library used to interact with Git repositories.

Home Page:http://gitpython.readthedocs.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Git.version_info can drop intermediate fields unexpectedly

EliahKagan opened this issue · comments

The version_info attribute of Git instances drops non-numeric fields, even if they appear before a field that is kept:

tuple(int(n) for n in version_numbers.split(".")[:4] if n.isdigit()),

If a non-numeric field is followed by a numeric field, the effect is to make it look like the field that was actually non-numeric has the value of the subsequent field:

(.venv) ek@Glub:~/repos-wsl/GitPython (main $=)$ cat >fake-git <<'EOF'
> #!/bin/sh
> printf 'git version 1.2a.3 (totally fake)\n'
> EOF
(.venv) ek@Glub:~/repos-wsl/GitPython (main $%=)$ chmod +x fake-git
(.venv) ek@Glub:~/repos-wsl/GitPython (main $%=)$ python
Python 3.12.1 (main, Dec 10 2023, 15:07:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import git
>>> git.refresh('fake-git')
>>> git.Git().version_info
(1, 3)

(As shown, partially numeric fields that do not wholly parse to a number are non-numeric for this purpose.)

I think this should either be changed to stop before the first non-numeric field, or kept as it is but with this behavior documented explicitly to reduce confusion. In the latter case, the rationale should be given. The rationale may just be backward compatibility, though I am not sure if that is a sufficient reason to keep this behavior, since it seems unexpected in the cases where it makes a difference. There may be other alternatives that I haven't thought of.

I think this is a minor issue, since such versions may be rare or nonexistent. git version numbers that have a non-numeric field followed by a numeric field are common, such as 2.43.0.windows.1 on Windows. However, GitPython only looks at the first four fields, so the trailing 1 in 2.43.0.windows.1 is not wrongly presented as the field that follows the 0.

From the comments, it looks like this situation was specifically envisioned, suggesting that the exact current behavior may be intended:

# We only use the first 4 numbers, as everything else could be strings in fact (on Windows).

However, it seems more likely to me that it was done this way to preserve earlier logic while fixing a specific bug. The if n.isdigit() filter was added in 6a61110 (#195).

Full current code for context:

GitPython/git/cmd.py

Lines 814 to 826 in afa5754

def _set_cache_(self, attr: str) -> None:
if attr == "_version_info":
# We only use the first 4 numbers, as everything else could be strings in fact (on Windows).
process_version = self._call_process("version") # Should be as default *args and **kwargs used.
version_numbers = process_version.split(" ")[2]
self._version_info = cast(
Tuple[int, int, int, int],
tuple(int(n) for n in version_numbers.split(".")[:4] if n.isdigit()),
)
else:
super()._set_cache_(attr)
# END handle version info

This may make sense to fix at the same time as #1829 and/or #1830, I'm not sure.

Thanks for investigating this!

I think the behaviour shown here is clearly incorrect, no matter the intent, and thus a fix would be preferred. Even though such versions would be unusual, it's a question on how to correct them. Should they be turned into the closest-possible number? Or should parsing stop there so, in this example, there is only a single number present? I think these details are best left to the one implementing this.

I think that if the chosen approach is simple enough then it may reasonably be included in the same PR that fixes some other version_info related bugs. So my inclination is to regard fields like 2a as non-numeric for now and stop before them, with the idea that this behavior is reasonable, but may be changed. This is, after all, the closest correct behavior to the current behavior (which regards such fields as non-numeric and drops them yet still considers the next field).

It may be that fields such as 2a should be recognized as numeric in the future. If so, a decision will have to be made about what suffixes should be accepted, and whether parsing should stop at such a field or still go on to the next one. (The possibility of something like 8.9.2-rc.3 suggests that parsing should not go further, since parsing that the same as 8.9.2 may be okay, but going further would parse it the same as 8.9.2.3 which does not seem right, since 8.9.2-rc.3 is definitely a strictly lower version than 8.9.2.1, while 8.9.2.3 is definitely strictly higher than 8.9.2.1.)

This issue with non-numeric parts is also related, at least conceptually, to the way Git.version_info currently differs (and perhaps always will differ) from sys.version_info, which I have noted in #1830 (comment).