Locally linear history: Transparently integrate a local patch-based workflow with a remote one-branch-per-change model.
Consider a stack of three commits representing a sequence of dependent changes:
* (HEAD) Remove old APIs
* Replace remaining users of old APIs with new APIs
* Add new APIs
* (origin/master) Delete failing tests
We wish to send these out for review, and we wish for the review periods to be open simultaneously: no waiting for the first commit to be merged before sending out the second for review. (Of course, the second commit cannot actually be merged into master until the first one is.)
This is a simple task in patch-based workflows. Projects like the Linux kernel and Git itself review code by sending raw patches to a mailing list, where updates to a commit are sent as replies to the thread. Systems like Phabricator and Gerrit similarly use diffs or changelists as the fundamental unit of change. But the task is harder to accomplish in systems that assume one branch per change, like GitHub and GitLab.
A simple approach is to push the first commit to a branch called add-new-apis
,
the second to a branch replace-users
, and the third to remove-old-apis
. Open
three pull requests:
- PR #1 wants to merge
add-new-apis
intomaster
. - PR #2 wants to merge
replace-users
intoadd-new-apis
(note: notmaster
). - PR #3 wants to merge
remove-old-apis
intoreplace-users
.
Once the first pull request is merged, we change the base branch of the second
pull request from add-new-apis
to master
. Once that one merges, we likewise
change the base of the third pull request from replace-users
to master
.
This is straightforward until we need to handle changes made during the review of the first pull request. We may modify the first local commit in an interactive rebase, which rewrites the subsequent two commits. The simplest way forward is to force-push each commit to its respective target branch after editing any commit in the sequence. This gets the job done, but has downsides broader than the general warnings against force-pushing to a shared repository.
After force-pushing to GitHub, some things just don’t work. Code review comments no longer properly hyperlink to the code in question. Notifications redirect to 404s and require manual dismissal. The pull request timeline events can collapse consecutive force-pushes, with no apparent way to view the intermediate states.
Furthermore, after pushing any commit to GitHub, any references in that commit’s
headline or body will immediately be cross-linked. This includes numeric issue
references (#123
), URL references, and username references (@account
). Thus,
pushing multiple distinct commits with the same message causes both notification
spam and issue timeline spam. The notification spam is annoying, but transient;
the issue timeline spam persists forever, and can make it quite difficult to
navigate.
Thus, we adopt the following constraints.
First, we do not force-push. After making changes to a commit, we merge that
branch into the branches of any downstream pull requests and send fast-forward
updates to all of them. This is tedious to manage manually, especially while
maintaining a locally linear history, which is why git-dx
exists.
Second, we only push a human-authored commit message once per pull request; any further automated commits to the branch will have only short messages that should not contain cross-references.
The local commit message is still the source of truth for all information about the change. Its headline and body may periodically be used to update the title and body of the corresponding pull request. Automated commits to the branch will include a hash reference to the original commit. This will point to an object known only to the local repository, and is provided as a convenience and to make it harder to accidentally lose the source commit.
Two commits are treequal if their trees are equal. (The tree of a commit can
be found with git rev-parse --verify COMMIT^{tree}
, replacing COMMIT
with
the commit hash or other unique identifier. The tree of a commit describes the
full state of the repository’s content at that commit, but not the commit
history or metadata.)
A local commit in a linear history corresponding to a single change is called a source, or source commit.
A source commit must have a branch directive, which is a Git trailer whose value is the branch key. Prepending an optional branch prefix to the branch key gives the target branch name. For instance, a commit with trailer
Dx-branch: reticulate-splines
would specify a target branch name of myname-reticulate-splines
if the branch
prefix were configured as myname-
. See git-interpret-trailers
for more information about trailers in general.
The remote branch specified by a source commit is called the target branch. After a successful integration, the source commit and the head of the target branch will be treequal. The target branch should be specified as the head branch of the pull request.
The unique parent commit of a source commit is called the local diffbase.
The remote diffbase is the commit that should appear at the head of the
remote branch used as the “base branch” of the pull request. The right choice
for this commit is slightly fuzzy. If the local diffbase specifies a target
branch and is not itself an ancestor of origin/master
, then the remote
diffbase is the head of the local diffbase’s target branch. Otherwise, the
remote diffbase is simply the local diffbase. In any case, the remote diffbase
should be treequal to the local diffbase.
In development. Not production-ready. No guarantees are made.
If you have set the trailer.separators
config value to a set that does not
contain a colon, then trailers may not be set correctly.
- Advice for clean history (Linus Torvalds, 2009-03-29, posted to the
dri-devel@lists.sourceforge.net
mailing list) (mirror at mail-archive.com) - “Managing dependent pull requests” (William Chargin, 2017-07-28)