Why I chose Git

Wednesday, 2 Jan 2008

This post started life as an email. John Gabriele wrote me after a one-liner comment in which I recommended Git to him in a thread on use.perl, asking why I chose it over various other DVCSs. I didn’t have plans to write a novel, but things just kept coming to me. I didn’t need to edit much either, it mostly came out in this form – I guess I had more to say than I knew.

So, why did I?

Among the systems I did look into, there are really just two contenders: Git and Mercurial. All the other systems track metadata; Git and hg just track content and infer the metadata.

By tracking metadata I mean that these systems keep a record of what steps were taken. “This file had its name changed.” “Those modifications came from that file in that branch.” “This file was copied from that file.” Tracking content alone means doing none of that. When you commit, the VCS just records what the tree looks like. It doesn’t care about how the tree got that way. When you ask it about two revisions, it looks at the tree beforehand and the tree afterwards, and figures out what happened inbetween. A file is not a unit that defines any sort of boundary in this view. The VCS always looks at entire trees; files have no individual identity separate from their trees at all.

As a consequence, whether you used VCS tools to manipulate your working copy or regular command line utilities or applied a patch or whatever is irrelevant. The resulting history is always the same.

Another consequence, at least with Git, is that it can track the movement of things smaller than a file, e.g. a single function being moved from one file to another.

And that sub-file level tracking in Git is an example of how, if the VCS is improved and its tracking becomes more intelligent, your entire repository instantly benefits from this. A metadata tracking system can’t do that because the old part of your repository didn’t have the necessary metadata recorded. A file-based VCS can’t do that because it doesn’t have an innate understanding that there are interrelationships between files.

So that’s why the only contenders are Git and Mercurial.

And between the two, Git has the better repository format – it’s much more robust (designed to make changes inherently atomic) and despite all this, is simpler than the repo format of any other VCS. It’s also efficient – an entire repository with all of the project history will often take less space than a Subversion checkout. (But Git needs periodic, manually-initiated vacuuming, say every couple of weeks or so, else it will grow as you work; Mercurial’s repo format is slightly more efficient and it doesn’t have that issue. But it’s not inherently robust, and observing a repository while changes are under way will reveal inconsistent data.)

Another reason for me to choose Git is that it’s rapidly gaining popularity. Among the DVCSs, it’s probably the most popular, followed closely by Mercurial. (Git is used by the Linux kernel (of course) and Freedesktop.org; Mercurial was chosen by the Mozilla Foundation and Sun Microsystems.) The rest are stragglers.

Lastly, the thing that really swayed me is git-svn. This is an integration tool that lets you check out from and push back to a Subversion repository, while working in a Git repository on your own machine. The integration with Subversion is incredibly slick. This means you can use Git rather than SVK (which is a castle built on sand) to get your DVCSy goodness when working on a project that uses Subversion.

But I don’t do any fancy stuff with my VCS

You don’t have to use Git for anything fancy to appreciate it.

Have you ever run into the situation that you were working on something, and noticed that there was a small unrelated change you’d also like to make? What to do – dirty the current checkin by including an unrelated change? Make a note and wait until after you’re done with the current work? Make a patch from the current changes, revert them, fix the small thing, commit, and reapply the patch? Continue working and use some kind of patch cherry-picking tool to untangle the changes later (hoping that no hunks will contain bits of both changes)? It’s a pain.

Fear no longer, though, for Git makes this easy: “git stash” puts away your current changes and reverts your working copy. You fix the little thing, commit, then “git stash apply” and “git stash clear”. Voilà! You can go back to working on the big change without any of the hassle – awesome.

By the way, notice how “git stash apply” and “git stash clear” are separate steps? That’s because you can stash away multiple changesets on top of each other, should you need to. It’s brilliant.

You may be wondering: this is a nice feature and all, but what in particular does it demonstrate? The answer to that is that it’s only possible in the form it takes in Git because Git is a DVCS and as such concentrates on making it easy to merge branches. What you do with “git stash” is internally a branch and merge, though a lightweight one with a very simple user interface – something that Subversion cannot trivially emulate, simple as it may seem (whereas Mercurial has a very similar feature, called Mercurial Queues).

Another nice thing for which you don’t have to want to do anything fancy: because pushing your commits to your central project repository is a conscious act in a DVCS, not something that happens automatically for every commit, you can rewrite history with wild abandon as long as you haven’t published it for the world to see. You can easily remove or fix commits where you made a mistake. I occasionally forget to tell the VCS about all new files, f.ex., so some go missing from the commit (no matter how diligently I use “$vcs status” to check whether I forgot anything). Git in particular makes that sort of thing very easy to fix, which is fabulously handy.

There is more – more than I can think of off the top of my head right now. DVCS will make your life easier in tons of little ways even if you aren’t looking to work in a highly decentralised manner with lots of branches, and Git is rapidly picking up features to let you exploit the advantages of a DVCS in this low-key, everyday, get-out-of-the-way manner. (F.ex., that stash feature? New in 1.5 which has only been out for a little while.)

Isn’t Git just something Linus cooked up for the kernel?

The story started there, but a few pages have been turned since. Linus wanted a system that could support the kernel process, which in many respects is an order of magnitude (or several) more demanding than that of most other libre software projects. So he concentrated on writing a system that he would want to use, which meant setting down a rock solid foundation without regard for how easy it was for other people to pick up. The first cut was rough to use as first cuts are wont to be, which established its reputation.

By virtue of it being used for the kernel, though, Git got a large community very quickly. They have since been improving the software at breakneck speed, so today it is no longer any harder to understand or use than any other DVCS. Its rock solid foundation, in contrast, remains. Internally it hasn’t changed much since that very first cut.

So it is now a viable system for anyone. Well, almost – there’s no good native Windows port yet. However, see above with regard to breakneck pace. Work is progressing, and I expect msysgit to be eminently usable within a couple months.