Who knows an XML document from a hole in the ground?

Monday, Jan 16, 2006, 02:21 (updated Tuesday, Jan 17, 2006, 13:57)

Hello back, everyone. (With a tip o’ the titular hat to Phil Ringnalda.) Some of you who’re seeing this will now be scampering to catch up on 20 of my entries, posted since the beginning of last December.

Reticent a person though I am, in general, I am not that reticent. For those of you who wondered why it’s gotten so silent around here, the reason is simple: because you didn’t notice that you need to file a bug report against the aggregator of your choice.

Yes, that’s right. The software you use is broken. And make no mistake, you’re not alone. There is a wide range of choices among bug trackers (or customer support forms), and their associated software, to file reports against.

This all happened because I tried to be a clever and good citizen and, in the process, save a bit of space in a smart way. But let’s backtrack a bit, and let me tell you what happened, and what the effect has been.

Until the beginning of December, the structure of my feed looked like this (and from today on looks like so again):

<feed xmlns="http://www.w3.org/2005/Atom">
  <title>plasmasturm.org</title>
  <!-- additional feed metadata elided -->
  <!-- things such as subtitle, author etc -->
  <entry>
    <title type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">Foo</div>
    </title>
    <summary type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">Bar</div>
    </summary>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml"><p>Baz</p></div>
    </content>
    <!-- additional entry metadata elided -->
  </entry>
  <!-- more entries follow -->
</feed>

You can see that there are xmlns="http://www.w3.org/1999/xhtml" bits strewn everywhere; in practice the effect is less drastic than it appears here where I’m eliding a bunch of Atom tags and any sensible content.

This setup means that in the document at large, the default namespace is http://www.w3.org/2005/Atom – so tags like <feed> are to be interpreted according to RFC 4287 –, but for the <div> tags and their contents, the default namespace is http://www.w3.org/1999/xhtml – so that the tags are to be interpreted according to the XHTML Recommendation.

This works perfectly in everything that claims support for Atom.

Then, in the beginning of December, I read an enlightening piece on XML citizenship by Joe English, titled A plea for Sanity, whose focus is how to structure XML documents with regards to namespaces such that software to process them can be kept simple, and which sets forth a few definitions on documents that are not sane. According to his definitions, my feed was neurotic: two different namespaces are mapped to the same prefix (i.e. the null prefix) at different points in the document.

I chose the neurotic structure I outlined above because it seemed logical to choose the Atom namespace as the default namespace for an Atom document at large. Since I author my weblog by editing a master feed (which contains the entire archive of the log) by hand, however, I definitely want the XHTML namespace to be the default for the sections of the document which contain HTML: having to write namespace prefixes in every tag would make the already tiresome experience of hand-written markup truly painful.

An added bonus is that despite the frequent repetition of the default namespace declaration, it actually saves space on declaring a prefix to map to the namespace once at the top of the feed and then having to write <h:p>Foo</h:p> everywhere. Across the whole feed, these two characters per tag easily add up to much more than the fixed cost of a namespace declaration in every text/content construct.

However, reading the plea for sanity inspired me to try something counterintuitive: declare the XHTML namespace as the default for the entire document and instead declare a prefix for the Atom namespace. This leads to a structure like so:

<a:feed xmlns:a="http://www.w3.org/2005/Atom" xmlns="http://www.w3.org/1999/xhtml">
  <a:title>plasmasturm.org</a:title>
  <!-- additional feed metadata elided -->
  <!-- things such as subtitle, author etc -->
  <a:entry>
    <a:title type="xhtml"><div>Foo</div></a:title>
    <a:summary type="xhtml"><div>Bar</div></a:summary>
    <a:content type="xhtml"><div><p>Baz</p></div></a:content>
    <!-- additional entry metadata elided -->
  </a:entry>
  <!-- more entries follow -->
</a:feed>

This is a sane XML document. And according to XML specifications, both this form of the document and the previous one are semantically exactly identical. They mean the exact same thing, and any compliant software which can process one of them will produce the exact same results given the other.

It’s also worth noting that the number of Atom tags within an Atom entry is small and varies very little. So even though I’m having to prefix every Atom tag, this actually saves space on redeclaring the XHTML namespace over and over. (Right now, the master feed, even though the savings are particularly diminished in it because it contains the entire archive and thus has a much higher HTML-to-Atom ratio than the newsfeed on the site, is about 1.2% smaller in its sane version than in the neurotic one. For the on-site feed, the ratio is a tad larger; not much, but with frequently requested documents like newsfeeds, a penny saved is a penny got.)

I was very pleased with myself for figuring out a way to improve my feed while reducing its size in the process.

Too pleased.

As a matter of fact, within a few days, Scott Arrington got in touch on IRC and informed me that my feed was suddenly throwing an error in Safari: Safari can’t open the page “feed://plasmasturm.org/feeds/plasmasturm/”, it said, and misleadingly continued: The error was: “unknown error” (NSURLErrorDomain:-1) (after all, the problem has nothing to do with a domain in the internet sense; though maybe error domain is Apple framework lingo for error type or error class).

I shrugged. Surely, this was an outlier. Certainly, breakage like that wouldn’t go unnoticed. Likely, it would be fixed with the next batch of updates.

As a matter of fact, yes, it is nice to live with the illusion that the basics of XML are at least moderately well understood, in this year 2006 of the Lord. Thanks for asking.

Until tonight delivered a rude awakening from Jagath Narayan to my inbox. He informed me that my feed failed to parse in a number of aggregators. So I got on IRC and asked Scott’s assistance to test drive a few more Mac aggregators with the feed.

Here’s the list of known broken aggregators as of this writing:

All of the major browsers with feed support are broken in their latest version. How depressing.

Here’s a list of known working aggregators:

And here’s a test case. Please mail me further results, and of course, file bugs avidly.

So now my feed is back to its old, neurotic form, and works for a lot more people. It feels good to know I’m back, even though I never knew I was gone.