In praise of Git’s index

Monday, 25 Jan 2010

I still run into people lambasting Git for the concept of the index from time to time. It seems strange and superfluous to users of other VCSs – like a speed bump that serves no purpose. Why not just commit the changes in the working copy? This perception is understandable; when I first heard of Git, back as a Subversion user, I was one of these people.

How times and minds change. Today, I use it and rely on it so much that I can’t imagine moving to any other VCS that doesn’t have this concept. (And none of the contemporary contenders do.) Because of this, I keep responding to such criticism, repeating myself. I figured I should put my explanation down somewhere where I can point people to.

So what is the index good for?

The key to understanding it is how it interacts with git diff. Once you add something to the index (also referred to as staging it), it disappears off the diff. You can pass --cached to see what changes you have staged, but by default, it doesn’t show you the changes that you have asserted are ready for commit. When I first read about this, it sounded outright stupid to me. Why would anyone want that?

Turns out: because it is hugely helpful. Consider: when a merge fails, the successfully merged diff hunks are staged, but conflicted hunks are not – which means that git diff will show only conflicts, and the successfully performed part of the merge doesn’t cloud the diff. Furthermore, the way to mark files with conflicts as merged is to stage them after manual resolution, which makes them too disappear from the diff. Maybe this is why Linus introduced the concept in the first place, being that the main part of his job is to perform merges all day long. But that’s far from the only circumstance in which the index has been useful to me.

The essence, already apparent in the above description but applicable much more widely than just during merges (which I don’t do a whole lot of, all things considered), is that the index introduces the idea of a known good part of a commit under construction.

Often, when I set out to make some self-contained change to the code, I don’t know up front the detailed approach of how I’ll go about it. I may also end up making incidental other changes – a small improvement to a utility library, a fix for a tiny bug I noticed while tooling about in its vicinity, stuff like that. As well, I sometimes end up changing directions a few times for some aspect or other of the change that I was originally planning to make.

Having the index available to me, I just keep working on things for however long I need to arrive at a clear picture, without worrying about commits. Afterwards, I start by reviewing the diff to see how to break down the work into chunks that will best make sense to whoever might read the patches later. Then I use git add --patch to gradually untangle changes from each other into separate logical steps. This command will even let you edit diffhunks for extra control, which I occasionally make use of to pull apart changes from multiple logical steps that ended up affecting the same line(s).

I’d say I end up making about 3-and-change commits on average out of non-trivial work units, along with a varying number of assorted one-liner commits that may get shuffled onto other branches. Yet I am free to get there any way I shall, rather than being forced to painstakingly plan out the minutiæ of the work ahead of time. I keep harping on this, but it really matters to me. I love how much Git goes out of its way to get out of mine in this regard.

NB: if you work this way, it means that when time comes to commit, you are making up commits that reflect states of the source code which never existed on disk before. So you don’t actually know whether the commit you are about to make is any good – a syntax error might have slipped in, say. Again, git has just the ticket: it’s called git stash --keep-index. This will stash the changes you see using git diff, but not the ones you have staged, so it will leave the code on disk exactly the same as the index. Use this just before you commit, to run your tests. After committing, you apply the stashed changes back into the working copy using git stash pop, as always, and continue where you left off.

In this kind of workflow, the meaning of git diff becomes “work I haven’t reviewed yet” or “work I don’t want to commit yet” and git diff --cached becomes “work I have vetted for inclusion in the next commit”. The index is what makes this possible.

I don’t know how I ever worked another way.