Paul Graham’s kind of dirty

Wednesday, Jan 30, 2008, 02:07

Arc is finally released as a (by the sound of it) wildly unfinished snapshot. In his notes about the decision, Paul expounds on the rationale for his design decisions to do things like skip Unicode support 1  and write HTML libraries that output presentational tables. Let’s take two quotations and put them next to each other.

One is always a bit sheepish about writing quick and dirty programs. And yet some, if not most, of the best programs began that way. And some, if not most, of the most spectacular failures in software have been perpetrated by people trying to do the opposite.

So experience suggests we should embrace dirtiness. Or at least some forms of it; in other ways, the best quick-and-dirty programs are usually quite clean. Which kind of dirtiness is bad and which is good? The best kind of quick and dirty programs seem to be ones that are mathematically elegant, but missing features – […]

Arc tries to be a language that’s dirty in the right ways. […] The kind of dirtiness Arc seeks to avoid is verbose, repetitive source code. The way you avoid that is not by forbidding programmers to write it, but by making it easy to write code that’s compact. One of the things I did while I was writing Arc was to comb through applications asking: what can I do to the language to make this shorter?

Clearly, the way to make programs written in the language shorter is to force them to deal with Unicode on their own.

He does make an effort to poison the well by saying that people who would care about these things probably wouldn’t like Arc much to begin with. Maybe the fact that I happen to speak all of Greek, German and English means that I shouldn’t have an interest in Arc, then.

Note I’m not saying it’s necessary for the first unfinished cut of a language to have Unicode support; but Paul seems to go rather beyond saying that. Note further than I don’t care one whit about his silly claims that presentational markup is the right kind of dirtiness: that can always be fixed by libraries. Character strings, however, are something that you really do need to get right at the core language level. You cannot leave strings for the libraries to fix. If you think that that’s a viable route, I have a bridge to sell you. And it’s written in C++.

Consider next that he mentions up front that it took Guido van Rossum the entire last year to rework Python’s character string support because of back-compatibility issues. (Other languages have had similar experiences.)

Leaving Unicode support in a language “for later” means you will spend a huge chunk of time sometime in the future to put it into the language – or you won’t, and then programs written in that language will forever be verbose when dealing with strings.

The right kind of dirty?

Footnotes:

  1. He says it supports only ASCII; I think he means octets instead, but I doubt he cares one way or another.