fabacab / git-archive-all.sh

A bash shell script wrapper for git-archive that archives a git superproject and its submodules, if it has any.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'--tree-ish' is broken

BenWiederhake opened this issue · comments

I don't understand what the following line (currently at https://github.com/fabacab/git-archive-all.sh/blob/master/git-archive-all.sh#L242 ) attempted to achieve:

TREEISH=$(git submodule | grep "^ .*${path%/} " | cut -d ' ' -f 2)

This is completely

First of all, git submodule only lists the "direct" submodules, not the "transitive" ones. This may be related to #2. Consider using something like git submodule foreach --recursive pwd.

The grep part assumes that the current state of the submodule is clean (the first char is for "clean",+for "changes made", etc.). That's not guaranteed. Indeed,git-archive-all.sh --tree-ish only really makes sense when the given tree-ish is different from HEAD.

The cut part tries to finish the regex matching that should have been done in grep; see grep -o.

It doesn't care anywhere about the original --tree-ish argument at all.

Consider using something like git submodule foreach --recursive pwd.

Git's foreach command did not exist when this code was written. IIRC, it wasn't available for about two years after this script was published.

I think this line passed the current submodule commit head to the submodule's git archive command. It's been years; this could probably use some updating.

For short, there's no way to know exact submodule's commit at the master repo's target commit?

I tried to illustrate it.


In case of below:

git archive-all -t a01 archive.tar
[repo A]                        [repo B] (submodule)

                                <uncommitted change>
commit a02 (HEAD)   ----------> commit b02 (HEAD)
commit a01 (TARGET) ----------> commit b01

git submodule returns like:

+__HASH_FOR_THE_UNCOMMITTED_CHANGE__ b (heads/master)

git submodule --cached fix that it points to the <uncommitted change>, though,
it only returns the submodule's commit associated with the parent repo's current commit:

Now repo A's HEAD is a02, and b02 of repo B is bound.
So, now git submodule --cached returns

+b02xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx b (b02xxxx)

To archive -t a01, you need to know the bounded repo B's commit b01;
but there's no way unless you checkout the target commit a01, right?

Of course, you shouldn't change the working dir condition.
You can't know the exact submodule's status in the past.


git submodule --cached change is below.
But this does not help everything as said above.

--- a/git-archive-all.sh
+++ b/git-archive-all.sh
@@ -249,9 +249,9 @@ fi
 if [ $VERBOSE -eq 1 ]; then
     echo -n "archiving submodules..."
 fi
-git submodule >>"$TMPLIST"
+git submodule --cached >>"$TMPLIST"
 while read path; do
-    TREEISH=$(grep "^ .*${path%/} " "$TMPLIST" | cut -d ' ' -f 2) # git submodule does not list trailing slashes in $path
+    TREEISH=$(grep "^.* ${path%/} " "$TMPLIST" | sed -e 's/^.//' | cut -d ' ' -f 1) # git submodule does not list trailing slashes in $path
     cd "$path"
     rm -f "$TMPDIR"/"$(echo "$path" | sed -e 's/\//./g')"$FORMAT
     git archive --format=$FORMAT --prefix="${PREFIX}$path" $ARCHIVE_OPTS ${TREEISH:-HEAD} > "$TMPDIR"/"$(echo "$path" | sed -e 's/\//./g')"$FORMAT

Also, git submodule status returns uncommitted submodule.
This is also a problem.

Means, git submodule add-ed, but have not commit the change yet.

[repo C]                        [repo A]                        [repo B] (submodule)

commit c03 (HEAD) <------------ <uncommitted change>            <uncommitted change>
                                commit a02 (HEAD)    ---------> commit b02 (HEAD)
                                commit a01 (TARGET)  ---------> commit b01

In case, git submodule status returns the editing submodule, C, regardless of --cached option.

% git submodule
+__HASH_FOR_THE_UNCOMMITTED_CHANGE__ b (heads/master)
 c03xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx c (c03xxxx)
% git submodule --cached
+b02xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx b (b02xxxx)
 c03xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx c (c03xxxx)

Finally, it needs some command to know submodules' status at the specific commit.
Without such command, --tree-ish cannot be fixed.

Now, you'd better not rely on --tree-ish,
but checkout the target commit by yourself and archive HEAD, sad to say.

How is calling git ls-tree for each submodule path obtained by git submodule status? (see #42)

How is calling git ls-tree for each submodule path obtained by git submodule status? (see #42)

git ls-tree was not good for sub-submodules (recursively contained submodules).

Instead, now there exists git submodule --recursive --cached.
Perhaps this may work?

Instead, now there exists git submodule --recursive --cached. Perhaps this may work?

Unfortunately, it was not complete, either.

It just checks for the commit of the submodules bounded to the top repo HEAD.

Fixed the PR to use

  • git ls-tree, if available
    (top repo's direct submodules, and also non-direct ones as far as it can)
  • otherwise, git submodule --recursive --cached
  • if none succeeds, submodule's HEAD