ishepard / pydriller

Python Framework to analyse Git repositories

Home Page:http://pydriller.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

from_commit filter working incorrectly

IP1102 opened this issue · comments

Describe the bug
I was analyzing the Apache Maver Core repository (https://github.com/apache/maven). I wanted to only analyze from_commit = "388e659a17d23cba75afd93336181f768248efa3" till the latest commit. This from commit hash was made on 2010-09-26 But when I add this filter value to the Repository class, instead of starting from 388e659a17d23cba75afd93336181f768248efa3 it starts from f9dca87cad2671becaac63df47077bf8660b9d33 which was made on 2005-12-12. There are an extra 1286 commits between actual start and intended start.

To Reproduce
To reproduce this, the following steps can be followed -

  1. Clone the Apache Maven repository (https://github.com/apache/maven)
  2. Write a small script like below:
    for idx,commit in enumerate(Repository(<path_to_cloned_repo>, 
                                           include_refs=False,include_remotes=True,
                                           from_commit="388e659a17d23cba75afd93336181f768248efa3", 
                                           ).traverse_commits()):

        if commit.hash=="388e659a17d23cba75afd93336181f768248efa3":
            print("ID",idx)
            print(commit.committer_date)

        if idx==0:
            print("Oldest-",commit.hash)
            print("Oldest-",commit.committer_date)

OS Version:
Windows/Linux

Thanks for flagging and to make it easy to reproduce.
The command that Pydriller runs is:

git rev-list --reverse --remotes ^388e659a17d23cba75afd93336181f768248efa3^ HEAD --

You can see it by setting logging = Debug.

The problem seems to be --remotes. If you remove that, everything works as expected.

Now, the question is why this flag changes the behaviour of Git. I'd have to deep dive into forums and see if someone already discussed this :)

Hope this helps!

Thank you for the explanation. Removing --reverse works for the maven repository but somehow it is not working for others. For example if you consider this repository and I want to get the commits starting from f3b628f8f9c71a2cdfa052025c4a1ed78ee4c45d. But using the command without the --reverse flag it generates commits starting from 3bfbc107eac92f388de9f8b87682c3a0baf74981.

After diving deep into the documentation and also getting help from the community, I believe the approriate command for this scenario will be git rev-list --reverse --ancestry-path=f3b6 f3b6^! HEAD So is there a way to override the GitPython function to generate this command?

The latest release should work! Let me know if you encounter other issues!