On tags (in the taxonomy sense) in Atom

Sunday, 4 Feb 2007 [Monday, 5 Feb 2007]

In Tag Scheme?, Tim Bray wonders how to make it explicit that a particular atom:category element is a tag (in the taxonomy sense – i.e. he wants to tag an entry “tagging” or “ethiopia” or “lisp”). This leads him to come up with a new URI scheme, urn:tag:. I didn’t comment on his post initially because the idea seemed so obviously ill-conceived that I assumed there must be something I’m missing. But after following the comments and seeing his update and comment, I still can’t see the merit.

Above all, I fail to see what kind of information one can infer from the presence of a scheme attribute with urn:tag: in it that is in any way distinguishable from the information inferrable from the absence of a scheme attribute. Given that Atom provides no way to specify any relationship whatsoever between two category elements, I fail to see how you can express anything other than tags with an atom:category element in the general case (where the consumer has no knowledge of the scheme built in).

Incidentally, for my del.icio.us-API-to-Atom converter I chose to associate each tag with the entry for a link in the following way:

<category scheme="http://del.icio.us/ap/" term="atom" />

That seems far more useful to me than using urn:tag: would be, because tags really convey two different pieces of information – primarily their string value, but secondarily which person used it to describe a particular thing. Losing the secondary piece of information makes the primary piece much less interesting. Even the mere knowledge that a particular tag was used by someone on del.icio.us vs someone on YouTube is still valuable because the major audience at each site is quite different. For this reason, http://www.tbray.org/ongoing/Tag/ in Tim’s feed would be far more useful than the mute and featureless urn:tag:.

Aside from that, it seems to me the entire argument has been conclusively debunked by Norm Walsh. I think the sort of desire that Norm argues against arises because HTTP URIs are explicitly opaque and people are told they shouldn’t try to parse them; so the desire arises for a way to communicate that, say, http://isbn.nu/1558607013 refers to ISBN 1558607013, or likewise that http://www.tbray.org/ongoing/Tag/Tagging refers to the tag “Tagging.” Such uses are probably more appropriately addressed in the general case by providing a URI template to specify how to extract the embedded information.

But in this specific case there is another possibility: to do the same thing as in HTML:

<a rel="tag" href="http://www.tbray.org/ongoing/Tag/Tagging">Tagging</a>

This translates directly to Atom:

<link rel="tag" href="http://www.tbray.org/ongoing/Tag/Tagging" title="Tagging" />

Here, the tag relationship effectively implies a URI template which says the last part of the path component is the tag.

Of course, there is no registered tag relationship in Atom, and since RFC 4287 is stricter that the HTML recommendation about relationships, in reality it would either have to be registered first (which is the same overhead as for conjuring the urn:tag: scheme into existence) or everyone would have to agree on a particular full URI to use for the relationship. Well, and there’s the minor detail that no current aggregator on the planet would support this type of link for a while to come… however, support for special treatment of categories with the urn:tag: scheme (if, as I wrote above, there even is any useful support to provide) wouldn’t spread any sooner anyway. The only difference is that one style has a default presentation in aggregators and the other does not.

In the meantime, however, no one says you can’t simply put the HTML version right there in the entry content (which, hey, gives you Technorati support as a bonus, right this instant) and a schemeless category tag in the entry next to the link (which, hey, gives you as much aggregator support as you’ll ever get in the category element, right this instant).

Update: Edward O’Connor compares a pile of possible tag representations in Atom, including the one I outlined above. To do so, he sets down a list of criteria, of which three here:

  1. Would provide a dereferenceable URI to something about the tag. In the typical blog context, a blog post tagged “cat” should have a link to a list of other posts on the same blog tagged “cat.” It would be especially awesome if this link were available in Atom processors unaware of this tagging technique.
  2. It should be possible for an Atom processor to know that this is a tag and not some other thing, without local knowledge of the site in question.
  3. It should be possible for an Atom processor to extract the (normalized) tag from the element in which it’s stored without parsing some attribute value or element content into pieces.

These do not seem sensible to me as stated. As I argued above, #6 is impossible. Unless the consumer has local knowledge of the particular scheme in use, it cannot tell hierarchical categories from a flat tag space, because Atom does not provide any means to relate any “category” to any other.

As well, Atom does not even define a relation between scheme and term. What this means is that criteria #4 and #7 are in direct contradition with each other. The convention most people have adopted is to use scheme as a prefix for the term and you get a dereferencable URI by concatenating the two, as in Tim Bray’s feed, where http://www.tbray.org/ongoing/What/ is the scheme and, say, Technology is the term. However, nowhere in RFC 4287 is this mandated. So in order to have something dereferencable to satisfy requirement #4, the term would really have to be http://www.tbray.org/ongoing/What/Technology. But this runs afoul of requirement #7: it is then not possible to supply just the tag by itself without relying on the consumer microparsing any attribute value.

Just to be clear here: the way I’d represent a tag in Atom is just the way Edward settles on:

<category scheme="http://www.tbray.org/ongoing/Tag/" term="tagging" label="Tagging" />

The contortions I went through in the initial post are in response to Tim’s objection that the presence of scheme should signify membership in some form of controlled vocabulary.