Transparent opaque changeable permanent URIs

Tuesday, 6 Dec 2005

I spent a lot of time musing about good URI design lately, trying to resolve a vexing contradication in generating good slugs (i.e. the last part of the URI that ultimately identifies a particular resource):

  1. URIs should never change. Having previously existing URIs return 404 Not Found is anathema.

    This prohibits deriving the slug from any part of the resource’s content (which would usually be the title): the content may change, but the slug must not. Assuming a database-driven website, the easiest way to achieve permanence is to store an opaque, unique token with each record (commonly simply a consecutive number) reserved solely for use as the slug.

  2. Slugs are extremely valuable space for textual context. The link itself is the one piece of link data that is always available (a truism, of course) – someone pasting a link in an email is not likely to include the page title, f.ex., so having readable, meaningful slugs is a great boon.

    Search engines also give huge priority to words found in the URI, which makes good slugs a very good idea if you care about your ranking.

    The generically best way to address this is to generate the slug by deriving it programmatically from the title of the resource.

How do you satisfy these antithetical conditions both at once? I punted on #2 and did only #1 so far, for the simple reason that URI permanence overrides all other considerations in URI design. But I’ve always been dissatisfied with this state of affairs.

One suggestion I saw long ago was to generate slugs as compounds of an opaque token and an essentially meaningless string generated from the title of the resource. When the server attempts to serve a request, it simply uses the opaque token and disregards anything else. Under such a scheme, one of my permalinks might be /log/356-patently-braking/, but it wouldn’t matter whether you actually requested that or just /log/356/ or, for that matter, /log/356-pink-elephants/. Only the number determines which resource you actually receive. That does prevent editing the title of a resource from breaking existing links, but at the price of URI proliferation – different people will bookmark different things, a spider can request /log/356-mumble/ variations all day long and will be told 200 OK until the server is blue in the face, and all the other drawbacks of multiple equivalent addresses for the same resource. So I discarded this idea long ago as untenable, and moved on to find a solution that works differently; to find a solution that works, period.

It turns out, though, that that scheme gets very close to the best solution, which, as always, is obvious in retrospect: if the client requests /log/356-pink-elephants/, the server should respond not by sending the document as 200 OK, but by sending /log/356-patently-braking/ as 301 Moved Permanently. In other words, resources are identified solely by their opaque tokens, but a client requesting a resource using a non-current compound slug will first be redirected to the address with the currently correct compound slug.

This beautifully solves the problem with changing, derived titles:

The URI may change, but any client looking for the old address will be told to please retrieve the new one, even without the server having to keep records of previously used addresses. At the same time, the web is never broken because at any one point in time there is only one URI for a resource which will yield a 200 OK response.