An orphan olive branch to Mercurial

Saturday, 5 May 2012 [updated]

Git repository browsers have universally awful graph drawing algorithms. — For the longest time, one of my repositories has had two main branches, master and release. For a release, I would git merge --no-ff master into release. (Using --no-ff forces a commit on release even if release could be fast-forwarded to the current state of master. That way the act of cutting a release is always recorded in the repository.) Development happens on master, sometimes on branches. Topic branches are rebased before merging them back to master, once again using the --no-ff switch to record that a certain stretch of commits belonged to one topic together.

Essentially, this is a two-track history, with occasional short parallel side tracks on one side:

                       o--o--o--o
                      /          \
-o---o---o---o---o---o------------o---o---o---o---o---o---o---o  master
      \   \           \                    \   \       \   \
-------o---o-----------o--------------------o---o-------o---o    release

You would think that this would be easy to draw in a sane way.

And most of the time it is. But sometimes repository browsers decide to to draw release on the other side of master. And as it happens, sometimes a topic falls by the wayside for a while. When these conditions coincide, drawing the stray heads from these topic branches and at the same time drawing release in such a way that the merge direction (from master into release) is correct suddenly requires snaking each release commit around all the previous ones. The result is a marshalling yard of parallel tracks (which I will not try to give an ASCII diagram of…) for representing what in reality is a very simple history. That makes it very difficult to make heads or tails of what really happened in the repository: a whole Black Forest out of just two trees.

There are some ordinary options to suppress this. The most obvious one would be to do a fast-forward merge of release back into master before picking up again. Doing so yields a triangular structure like this:

                           o--o--o--o
                          /          \
-o---o   o   o---o---o   /------------o---o---o   o   o---o   o---o  master
      \ / \ /         \ /                      \ / \ /     \ / \
-------o---o-----------o------------------------o---o-------o---o    release

Here there are no parallel tracks: the only unbroken track is the release branch, so no matter when and how any algorithm tries to draw this graph, it will be forced to string the commits into short side tracks alongside the release track. There is no likely way to turn this into a funhouse of illusory complexity.

Any solution that merges release into master in any way will have a very annoying drawback, however: you can no longer read the history of master without getting all of the release merges interspersed into it. This is all the worse if you never gave you those merge commit messages much thought, because that means the history of release by itself consists of nothing but an endless row of “Merge 'master' into release”. And if that was bad enough by itself, it gets really irritating during periods when most commits are released immediately: the noise takes up a major part of your commit log.

Then an epiphany disrupted my long-standing dissatisfaction with the situation.

This is what the history in my repository looks like now:

                       o--o--o--o
                      /          \
-o---o---o---o---o---o------------o---o---o---o---o---o---o---o  master

-------o---o-----------o--------------------o---o-------o---o    release

That’s right: no merges.

Yet again, release is a single unbroken track. But now so is master. And since the branches are unconnected, it is never necessary to arrange them relative to each other, so they will always be drawn properly. And the master commit log remains clean and readable.

What I have done is make release an orphan branch that shares no history with master (created with git checkout --orphan). To cut a release, I check out release, then I get the tree from the commit I want to release and put that in a new commit on release. Obviously with this scheme I need to manually record the commit ID somewhere to be able to know what state of master a particular release corresponded to – there is no longer merge metadata to keep track of that. The commit message seems a natural place to record that information. I need to construct one in any case since Git does not know how to provide a default message for these commits like it does when merging a branch. Of course, the extended commit message is also a good place to put a list of commits that are hitching a ride on this release. I decided to put a release version (in my case, a simple incrementing integer) in the commit message subject as well, to make it easy to refer to a particular release.

Needless to say, I have the process automated. This is my release script:

#!/bin/bash
set -e
commit=`git rev-parse "${1-master}"`
read num junk oldcommit <<<`git log --no-walk --format=%s release --`
(
  printf '%d @ %s\n\n' $((++num)) $commit
  git log --reverse --oneline --abbrev=12 --no-decorate --no-color $oldcommit..$commit
) \
| git commit-tree $commit^{tree} -p release \
| ( read new ; git update-ref refs/heads/release $new )
git push -f origin master release

Aside from the hard linkage by commit ID you also get a soft correlation by commit date if you ask git log and friends to use --date-order. This is sufficient for routine development work. Note that since the commit IDs are recorded, it is possible to use grafts to retrospectively (possibly temporarily) make the orphan release branch seem as though a mergeful branch.

A nice aspect of doing things this way is how easy it is to get a full diff of the total change represented by a release. With a merge-based release branch it takes fiddling to ask for that diff and enough knowledge to know how to.

And so I seem to have arrived at a poor (technically awkward, functionally very limited) reinvention of Mercurial’s named branches, using the plumbing provided by Git. This may be the only true use case for named branches that I can think of.

Update: I’ve rewritten the script to use lower-level plumbing. It no longer even checks out the tree, it just directly creates a commit object based on the tree object of the released commit.