Wrapping up XSLT word wrapping

Thursday, Jul 15, 2010, 22:22 (updated Thursday, Jul 29, 2010, 16:04)

At the turn of the year, Dave Brotherstone wrote me to let me know that my implementation of word wrapping in pure XSLT is buggy. He included a fixed version which I did not feel comfortable with because it was much longer than the original and had some obvious remnants of experimentation. However, his approach was conceptually correct, as I discovered after independently examining and fixing my code and thus gaining the understanding I needed to assess his version. I wound up with an equivalent to his code, though with one significant control flow refactor. This is what I got:

<!-- Copyright 2010 Aristotle Pagaltzis; under the MIT licence -->
<!-- http://www.opensource.org/licenses/mit-license.php -->
<xsl:template name="wrap-string" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:param name="str" />
    <xsl:param name="wrap-col" />
    <xsl:param name="break-mark" />
    <xsl:param name="pos" select="0" />
    <xsl:choose>

        <xsl:when test="contains( $str, ' ' )">
            <xsl:variable name="first-word" select="substring-before( $str, ' ' )" />
            <xsl:variable name="pos-now" select="$pos + 1 + string-length( $first-word )" />
            <xsl:choose>

                <xsl:when test="$pos > 0 and $pos-now >= $wrap-col">
                    <xsl:copy-of select="$break-mark" />
                    <xsl:call-template name="wrap-string">
                        <xsl:with-param name="str" select="$str" />
                        <xsl:with-param name="wrap-col" select="$wrap-col" />
                        <xsl:with-param name="break-mark" select="$break-mark" />
                        <xsl:with-param name="pos" select="0" />
                    </xsl:call-template>
                </xsl:when>

                <xsl:otherwise>
                    <xsl:value-of select="$first-word" />
                    <xsl:text> </xsl:text>
                    <xsl:call-template name="wrap-string">
                        <xsl:with-param name="str" select="substring-after( $str, ' ' )" />
                        <xsl:with-param name="wrap-col" select="$wrap-col" />
                        <xsl:with-param name="break-mark" select="$break-mark" />
                        <xsl:with-param name="pos" select="$pos-now" />
                    </xsl:call-template>
                </xsl:otherwise>

            </xsl:choose>
        </xsl:when>

        <xsl:otherwise>
            <xsl:if test="$pos + string-length( $str ) >= $wrap-col">
                <xsl:copy-of select="$break-mark" />
            </xsl:if>
            <xsl:value-of select="$str" />
        </xsl:otherwise>

    </xsl:choose>
</xsl:template>

The original code had two problems, both involving its use of the pos parameter, which keeps track of how many characters from the string have already been consumed, in order to decide when to wrap. The mistakes:

  1. When the wrapper outputs a space between words, the space is not accounted for in pos; an off-by-one error. This is the bug I spotted.

  2. The value of pos is not reset when breaking a line, so it is relative to the start of the overall string, not relative to the start of the current line; a logic bug. This is what Dave spotted. I didn’t even realise this was the case – it is so obvious that pos must be line-relative that when I started reading my code, I simply assumed it was.

In the old code, the wrapper checks whether a wrapping point has been reached at the start of an iteration, and if so, outputs break-mark, otherwise a space. Then it always outputs first-word (the non-whitespace at the start of the remainder of the string) and finally it recurses, passing the remainder of the string, giving $pos + length( $first-word ) as the new pos.

To fix the first bug, the value passed as pos would have to depend on whether a space or break-mark was output. Due to the purely functional nature of XSLT, I found no good way to express this without code duplication. (In an imperative language, I would just increment pos in the conditional block that outputs a space. In XSLT I could see no other way than to test the same condition once in order to know whether to output a space or break-mark, and another time in order to figure out whether to add 1 or 0 to pos.) It seemed that fixing the error would be significantly detrimental to the clarity of the code.

I got stuck on this for quite a while.

Eventually, I found a sleight of hand that also allows some conceptual simplification, so thankfully, neither brevity nor clarity suffered too much for the result. The trick is to check whether the line is about to overflow, and if so, output break-mark but not first-word, passing down the string to the next recursion unchanged, effectively spinning the loop in place for one iteration. This folds the start-of-line case into the code path for the start-of-string case, and allows unconditionally passing 0 as the value for pos for both of them, which avoids duplication of conditionals – and just so happens to change the semantics of pos such that Dave’s bug is inadvertently also fixed. The form of the conditionals required by this semantic change is also nicely simple compared to those I had before.

In the end, the new code turned out slightly longer than the original and also harder to understand, though with the decided advantage of, y’know, working correctly. Fortunately it is nowhere near as bad as I feared it might become when I started on it, and in terms of conditionals, actually got much clearer.

Update: fixed another bug. Previously the condition was just “$pos-now >= $wrap-col”, so words wider than $wrap-col would send the code into infinite recursion: the loop would spin in place at the start of the line without ever advancing through the source string. Now the condition is “$pos > 0 and $pos-now >= $wrap-col”, which means the word which starts a line will always be emitted no matter its length.

I need a test suite…

iSingularity? (take 2)

Saturday, Jan 30, 2010, 16:45

Yesterday I wondered:

Where are the people who worry about [the iPad] being the future of computing?

Turns out I was just too impatient by a few hours. Alex Payne voiced the same thought and Steven Frank wrote along very similar lines, though in some fundamental ways I disagree with him. Adam Pash wrote a decent piece at Lifehacker, though I think the issue is better covered by David Megginson’s rather wider concern. But the piece that I was looking for, basically word for word, was Mark Pilgrim’s take.

The reason I disagree with Steven is that I think the Old World/New World dichotomy is a red herring. The only real difference between these is what UI metaphor is predominant in each and what supporting concepts are exposed to the user. Steven talks about “Old Worlders” expecting windows, menus and toolbars and other complexity that presumably corresponds to power. But as an Old Worlder, that’s the least of my worries. In my opinion, compared to home computers, personal computers already present huge barriers to tinkering – but merely de facto, due to the sheer complexity of modern systems.

Let me walk down memory lane. I grew up on PCs, not home computers, myself. I boggle in retrospect at how many stumbling blocks the Microsoft ecosystem and culture forced me to overcome. People who grew up on either home computers or Unices had an order of magnitude easier a time to get into computing. If I’d been someone not so doggedly curious, that differential could easily have been enough to keep me away. Things haven’t gotten better since, and meanwhile the complexity of modern computers has only increased. But the defining situation for children and teenagers is that they have no money but an infinite supply of time. In the Microsoft ecosystem, those were largely fungible – and so I overcame.

On the iPad? Not a chance. The iPad’s answer to the problems of personal computers is to simplify the UI – which is good. But the complexity under the hood isn’t even a concern. And that’s because it legislates a barrier to entry for tinkerers. No one can do anything with it that Apple does not approve of – in Adam Pash’s words, Apple’s gotten into the habit of acting like you’re renting hardware. Now, you can tinker – but you need a Mac and an iPhone dev licence: a large wad of cold hard cash, exactly what children and teenagers don’t have. (Some of them will have parents who understand why this is a good idea and can provide the spare cash. I was out of luck on both counts.) The iPad the barrier to entry is so ridiculously high, I would not have been able to surmount it.

In contrast to Steven’s thesis, I posit that the iPad represents no trend reversal, but rather is poised to be the bend in the hockey stick shape of a curve we have been riding for a long time – as Robert Young points out:

When IBM created the Personal Computer in 1981, it predicted 2,500/year in sales. They based this estimate on a specified use case: users (assumed to be engineers, scientists, etc.) would write programs for their own use, and run same on their Personal Computer. To that end, IBM made available 3 operating systems from which the user could choose the one to his liking: CPM/86, UCSD P-System, PC-DOS. It was envisioned to be a mainframe on a desk.

And so it was until… Lotus released 1-2-3, and only for PC-DOS. At that point the light bulb went off around 128 and the Valley: what IBM had created was an Office Stove, a device for which the User DIDN’T write the programs to be run, but which could bake all sorts of delight food stuffs. That IBM didn’t restrict the BIOS and didn’t secure the OS’s made Billy Boy rich. And quite a few programmers.

The iPad is just the most extreme extension of this paradigm: it’s an appliance, but significantly less open to the gaggle of Cooks in the wild. Users, in Steverino’s mind, couldn’t care less whether the Cooks are indentured servants to Apple. They don’t even care that they are locked-in to Apple. They just know that the tarts taste good.

The iPad is not a revolution. It is right in line with where we have been going for decades. If it represents anything fundamental, that is the courage to throw out an ill-fitted UI metaphor to better serve this direction.

But how would the fundamental experience of the device suffer if Apple shipped a dev environment with the iPad, just like one used to be part of every home computer (incl. the Apple II)? Is that really an inconceivable proposition? Or heck, it could be a $20 download on the App Store for all I care. That’s no hurdle for a teenager, not even a big one for a preteen. Why must the iPad require a dev licence and a Mac to write code for? (Obviously: because that makes Apple a lot of money.)

The current personal computer is a bad paradigm. What I was hoping for was a move toward things like Alan Kay’s visions – a simplification of programming to the point where everyone (especially kids) can do it so easily, at least for simple tasks, that it becomes routine. The iPad is the direct opposite of that.

The irony in all this is that for all of how much Adobe Flash gets lambasted in the Apple sphere (and make no mistake, I am not enamoured with Flash on any level), it let Joe Gregorio’s 13-year-old create his first game, one of subsequently many others. And a successful iPad would close even this unsatisfying avenue.

Is the future we’re getting the one we really want?