ishepard / pydriller

Python Framework to analyse Git repositories

Home Page:http://pydriller.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Repository.traverse_commits() with from_tag and to_tag does not yield any commits

Darnol opened this issue · comments

Hi there. I'm new to PyDriller and have a question regrading the behaviour of the Repository class when fetching commits, when from_tag and to_tag are specified. I'm not sure whether this is a bug or I'm using the package wrong.

Description

I want to analyze the kafka repository, all commits from tag 3.5.1 to tag 3.6.0. (https://github.com/apache/kafka)

Here is the code I used

from pydriller import Repository

for commit in Repository("https://github.com/apache/kafka.git", from_tag = "3.5.1", to_tag = "3.6.0").traverse_commits():
    print(commit.hash)

# Does not yield any commits

Now, I had a look at the code and tried to debug, and I think what it comes down to is, that the Repository instance tries to fetch all git commits with the following code:

git rev-list --reverse --ancestry-path=2c6fb6c54472e90ae17439e62540ef3cb0426fe3 ^2c6fb6c54472e90ae17439e62540ef3cb0426fe3^ 60e845626d8a465a8cfe68bb2d7d4b88d622634e --

If I run this line, I do indeed get an empty list of commits. My understanding of git rev-list is rather limited, but could it be possible that the usage of this command with the --ancestry-path option is wrong?

If I remove this option and run

git rev-list --reverse ^2c6fb6c54472e90ae17439e62540ef3cb0426fe3^ 60e845626d8a465a8cfe68bb2d7d4b88d622634e --

I get the expected list of all commits that are reachable between the commit 2c6fb6c54472e90ae17439e62540ef3cb0426fe3 (commit for tag 3.5.1) and the commit 60e845626d8a465a8cfe68bb2d7d4b88d622634e (which is the commit for tag 3.6.0)

Expected Behaviour

I would expect to get a generator over all commits between tag 3.5.1 and tag 3.6.0 from the kafka repository.

Additional Information

OS Version macOs (13.2.4)
PyDriller 1.5.2
git 2.42

Hi!
By looking at the repo, I can see that the commits in that branch are not merged into master. This means that, by traversing only master (the default in Pydriller), you will never reach those commits, hence no result.

To fix this, you could run it with include_refs, which means it will include all branches, even though they are not in master:

for commit in Repository("/tmp/kafka/", from_tag = "3.5.1", to_tag = "3.6.0", include_refs=True).traverse_commits():
    print(commit.hash)
2c6fb6c54472e90ae17439e62540ef3cb0426fe3
4ca2c84a2ba2afb4bd35b44d65339d3c2129b60b
9f9cfbd6fabfb50cd4e9dacb4e8b9d88265f91e6
b7dc50b659f7f1d5cd0fc3b2dded6115841be70c
3f7c0c83d6f761a49d12d182beec9c416ac96012
9fc01628b53217f8946020eaf0acfd2a1649a6a6
eada8846cc13a35b3b95b1b8bf2485ded50aec6e
48ba000678b76d8698c6db5d84752816f70be6b5
b5d7a4a6154c30463c6c41f89bce5c23c36d0809
31cec6fd55b1a280abac4a31d725508ac2667367
0e739901887b5a0195e274a1085c7de43d9c104b
1440e16f22ae0d0dd9369c8269381fe7652faedc
24d2c2d2e25cbdaa6b1b6ba19403287709f81682
1c568069329a392c22932786925e8489018e00fe
34a30fff5769bae55f978631e7a44f59631ee8a6
12fc0d04d71d29aec13ca0d41ee35c905e8b03c0
abd1c8e46fc413f8cb8d2d07a80ee74eaf5d9708
a9ccb8562eeed6dd287c138796afdc0603daed3b
714b4ebeeb9a87c9f121b24eeacf38f377a9b60a
fffbab7951eb375927e53fc5b57580354c24fb8c
7e3f1c198dbc79a3e576bbd89b70efc2409c4db8
3b9c3da97847518c57863cbff8a9bef2fd1f7ada
0492a3bc87abf89462eddff0ace62eaad54612fa
6f55160175a270f4c7f32c5673775c1e6e9a2d43
c252e930fb7294cfba987e5826ff6a0c1eabb500
1966f51d6212a1e4efa14003ad1c708b35d7b0db
ea206a3d36770d94289f6e6e6113c50c267791ae
9a818d2ca743379e65c68959ab526266c963a88f
fb85e9d4aafa3cc079d9fd7a4e0ac751ab9ac088
b8cf3e31747f7193024c36f3381c0dd5bd22158c
67032b8ecfa00a9b403045ce3f1219755365a9d2
6bc36eaf2370a077ebfe2ccbc55aa2d0bceae020
e28c1befc3cefc9f918565ebe6317c54f4ed8122
51a7acda25ce6f6939dee299587661bec4411557
00a1b9f769aecb78cf208ef2351dd328b47d0c7f
8f7310eab6cb73da6d8ab4893354bf945be78ed0
1fc067ff2eac580452f859bddb2f8b29b64cd2e1
c55e89a6003c0646f58611e60613afb6ad2ab77c
0cadf0db714a5f0bea52e160217198c1233cf0a1
b3b457bf1ba272aed730c6b78cba38480cd2e5eb
319dc61de79927288c9d13cede8456c14e5702a0
0c90b6557eaae0b8036263ab907e2a6213ea82ea
ccdffd6e4fd97a4cdcdf5ef790684eeca8e32bf7
8b4369c57384102d422f0a87e47b6e4f9e99fbe0
add9dc3340fde5c4bd74b1459e89309f51c233a0
d260329056c6c89920255d71a6773dcdd2c1039d
303c86f7a5ca2736138b76476928abb22869dfe9
d769f1dd87b8101f234e30bff9e86fd177943197
a6a893796e695e09e87ce36d883b0efcba3e99ba
ad925d2582f9aea638b7b8952764ce3976a6b87c
9e10f8959aece8dd259c9e8e1994ce7dc3bd6fa4
f682ecf9db63b4944a09eb05ad2a31b8d2510767
749df0716318dee1d5fd7bf2daae4e9650c70a3a
86d4022d488431f71e121609ae83f1672fd989a9
c7eae56dfa17746bea242b81027208164ba66ae7
5829fca0a7f65cd0b0c8f8afdb5a28124d4ae7b7
3e5ec6fd7171fdfc479da0d9cc8f59f012e6d73f
79de845bd55b153cabcc54f0f2e9fc688f36d5d7
3c8ca01cefb4bbaec787b716cd929e4e7da1b512
9aeaa5dc18ba9a4d6e9cf4a77c6e079e24b13a80
c9a648880570e197aa465a4cbd0e158a3cfa981f

Hi there, thank you so much for having a look! That makes sense of course, thanks for the solution :)