I just noticed and fixed an embarrassing bug in my feed generation stylesheet. To save on bytes for a document not really intended for humans anyway, it previously had
<xsl:strip-space elements="*" />
Unfortunately, that did a little too much: it stripped significant whitespace from HTML with back-to-back tags. Tag clusters like
<em>the</em> <a href="foo">bar</a> would be smushed together as in
The solution is awkward:
<xsl:template match="atom:*[ not( self::atom:content or self::atom:summary ) ]/text()[ normalize-space() = '' ]" />
This rule discards all whitespace-only text nodes which are direct children of elements in the Atom namespace – except for
summary elements –, so it doesn’t touch any of the whitespace in HTML content, avoiding the problem.
Once I finally switch the feed to Atom 1.0, the rule will get a little bit simpler, because the wrapper
div element dictated in that spec for Text Constructs of
xhtml type means I can drop the
[ not( self::atom:content or self::atom:summary ) ] predicate. But it’ll still be sort of awkward.
I wish various XSLT directives like
xsl:strip-space could use full XPath expressions.
The funny (read: embarrassing) thing is that I spent the better part of a year thinking that a range of aggregators all had the same bug – because after all, the content looks right on my website as viewed with a browser. What was I thinking? Certainly not that since the content on the site is generated by a different script, I might have bug in the shorter one. Talk about a blind spot.
Looking at my feet, I acknowledge that I’m obviously not humble enough yet.