bureado / giternals-trees

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Playing with git internals

Here I'm playing with git internals.

Selecting a subset of the tree

I'm using git ls-tree -r HEAD a/ab/5 c/3 b/2 to select a few objects from the tree. I then intend to use git mktree to create a tree, but don't associate a commit or anything "semantic" like that. My end goal is to ref to said subtree.

This works conceptually, but git mktree doesn't take directory structures: it complains about slashes. So you would need to recreate the structure by selecting each blob in the deeper trees and mktree new ones ("masked" trees) and work your way up until you have a final git mktree that only ingests blobs in the current directory, and all the "masked" subtrees. That gives you the final hash of a tree that represents the subset of objects in the tree.

(It's unclear to me whether appending to the tree would work.)

It's likely that using the index file makes this easier, but that would mess "semantically" with the repo. See [this note] from the git-annex folks on potentially moving to mktree but running into the same concern as above.

I proceeded with the simplest example:

git ls-tree HEAD b | git mktree
b9645f5603b4efe0368929ccf313145818fdc245

git ls-tree b9645f5603b4efe0368929ccf313145818fdc245
040000 tree 9020154eb21da8cb3fd6860ec7065e962c446ed6	b

Also see Tree objects.

Referring to a subtree

A ref points to a commit points to a tree. Can we skip the commit? We start from:

git cat-file -p `cat .git/refs/heads/master`

tree 20bd0698e81a9df2d650df7a1de24e6366618df7
parent daec7dadfc2e1e23a855d301dd0d74eea3863faa
...

and this naive enumeration of the index:

git ls-tree $(git cat-file -p `cat .git/refs/heads/master` | grep '^tree' | cut -f2 -d' ')
100644 blob f9e334c8a740ce3c72c8364c97c39fba0756d089	README.md
040000 tree ccf241bf540a57f540ca43bf9494c4365d54434a	a
040000 tree 9020154eb21da8cb3fd6860ec7065e962c446ed6	b
040000 tree 6e36c7dfb97e11e9e5877e4e366b7b18afa7a8be	c

I tried with git update-ref refs/subtrees/simple b9645f5603b4efe0368929ccf313145818fdc245 (aka a lightweight ref) but how does it know that b96... is a tree? Turns out it does:

git for-each-ref
...
b9645f5603b4efe0368929ccf313145818fdc245 tree	refs/subtrees/simple

"Checking out" a subtree

In the next step, I'd like to change the working tree to the subtree, to limit the files that a process "sees" on disk. This is in connection with my experiments on build systems helping "dereference" binary-to-commit hash references. Imagine a large repo with a Makefile that creates many binary artifacts. Imagine that make can assert which inputs from the repo were used to build a particular artifact, then create a subtree and commit a ref to it. And now, imagine that in future runs only this subtree is checked out for a given make run.

(TODO and unclear if this would be with checkout <treeish> or read-tree or something else.)

References

About