Wrapping up XSLT word wrapping
At the turn of the year, Dave Brotherstone wrote me to let me know that my implementation of word wrapping in pure XSLT is buggy. He included a fixed version which I did not feel comfortable with because it was much longer than the original and had some obvious remnants of experimentation. However, his approach was conceptually correct, as I discovered after independently examining and fixing my code and thus gaining the understanding I needed to assess his version. I wound up with an equivalent to his code, though with one significant control flow refactor. This is what I got:
<!-- Copyright 2010 Aristotle Pagaltzis; under the MIT licence --> <!-- http://www.opensource.org/licenses/mit-license.php --> <xsl:template name="wrap-string" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:param name="str" /> <xsl:param name="wrap-col" /> <xsl:param name="break-mark" /> <xsl:param name="pos" select="0" /> <xsl:choose> <xsl:when test="contains( $str, ' ' )"> <xsl:variable name="first-word" select="substring-before( $str, ' ' )" /> <xsl:variable name="pos-now" select="$pos + 1 + string-length( $first-word )" /> <xsl:choose> <xsl:when test="$pos > 0 and $pos-now >= $wrap-col"> <xsl:copy-of select="$break-mark" /> <xsl:call-template name="wrap-string"> <xsl:with-param name="str" select="$str" /> <xsl:with-param name="wrap-col" select="$wrap-col" /> <xsl:with-param name="break-mark" select="$break-mark" /> <xsl:with-param name="pos" select="0" /> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$first-word" /> <xsl:text> </xsl:text> <xsl:call-template name="wrap-string"> <xsl:with-param name="str" select="substring-after( $str, ' ' )" /> <xsl:with-param name="wrap-col" select="$wrap-col" /> <xsl:with-param name="break-mark" select="$break-mark" /> <xsl:with-param name="pos" select="$pos-now" /> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:if test="$pos + string-length( $str ) >= $wrap-col"> <xsl:copy-of select="$break-mark" /> </xsl:if> <xsl:value-of select="$str" /> </xsl:otherwise> </xsl:choose> </xsl:template>
The original code had two problems, both involving its use of the
pos parameter, which keeps track of how many characters from the string have already been consumed, in order to decide when to wrap. The mistakes:
When the wrapper outputs a space between words, the space is not accounted for in
pos; an off-by-one error. This is the bug I spotted.
The value of
posis not reset when breaking a line, so it is relative to the start of the overall string, not relative to the start of the current line; a logic bug. This is what Dave spotted. I didn’t even realise this was the case – it is so obvious that
posmust be line-relative that when I started reading my code, I simply assumed it was.
In the old code, the wrapper checks whether a wrapping point has been reached at the start of an iteration, and if so, outputs
break-mark, otherwise a space. Then it always outputs
first-word (the non-whitespace at the start of the remainder of the string) and finally it recurses, passing the remainder of the string, giving
$pos + length( $first-word ) as the new
To fix the first bug, the value passed as
pos would have to depend on whether a space or
break-mark was output. Due to the purely functional nature of XSLT, I found no good way to express this without code duplication. (In an imperative language, I would just increment
pos in the conditional block that outputs a space. In XSLT I could see no other way than to test the same condition once in order to know whether to output a space or
break-mark, and another time in order to figure out whether to add 1 or 0 to
pos.) It seemed that fixing the error would be significantly detrimental to the clarity of the code.
I got stuck on this for quite a while.
Eventually, I found a sleight of hand that also allows some conceptual simplification, so thankfully, neither brevity nor clarity suffered too much for the result. The trick is to check whether the line is about to overflow, and if so, output
break-mark but not
first-word, passing down the string to the next recursion unchanged, effectively spinning the loop in place for one iteration. This folds the start-of-line case into the code path for the start-of-string case, and allows unconditionally passing
0 as the value for
pos for both of them, which avoids duplication of conditionals – and just so happens to change the semantics of
pos such that Dave’s bug is inadvertently also fixed. The form of the conditionals required by this semantic change is also nicely simple compared to those I had before.
In the end, the new code turned out slightly longer than the original and also harder to understand, though with the decided advantage of, y’know, working correctly. Fortunately it is nowhere near as bad as I feared it might become when I started on it, and in terms of conditionals, actually got much clearer.