Humble pie

Saturday, 6 Aug 2005

I just noticed and fixed an embarrassing bug in my feed generation stylesheet. To save on bytes for a document not really intended for humans anyway, it previously had

<xsl:strip-space elements="*" />

Unfortunately, that did a little too much: it stripped significant whitespace from HTML with back-to-back tags. Tag clusters like <em>the</em> <a href="foo">bar</a> would be smushed together as in <em>the</em><a href="foo">bar</a>.

The solution is awkward:

<xsl:template match="atom:*[ not( self::atom:content or self::atom:summary ) ]/text()[ normalize-space() = '' ]" />

This rule discards all whitespace-only text nodes which are direct children of elements in the Atom namespace – except for content/summary elements –, so it doesn’t touch any of the whitespace in HTML content, avoiding the problem.

Once I finally switch the feed to Atom 1.0, the rule will get a little bit simpler, because the wrapper div element dictated in that spec for Text Constructs of xhtml type means I can drop the [ not( self::atom:content or self::atom:summary ) ] predicate. But it’ll still be sort of awkward.

I wish various XSLT directives like xsl:strip-space could use full XPath expressions.

The funny (read: embarrassing) thing is that I spent the better part of a year thinking that a range of aggregators all had the same bug – because after all, the content looks right on my website as viewed with a browser. What was I thinking? Certainly not that since the content on the site is generated by a different script, I might have bug in the shorter one. Talk about a blind spot.

Looking at my feet, I acknowledge that I’m obviously not humble enough yet.

Referenced in The quantum leap.