On employing RFC 2119 vernacular

Tuesday, 26 Sep 2006

In my discussion with Alastair about his common blog export format proposal, a few thoughts crystallised in my mind that I wanted to jot down. Herewith, then, some short notes.

RFC 2119, in case you don’t know, is a document that defines terms for three levels of requirement: MAY/OPTIONAL, SHOULD/RECOMMENDED and MUST/REQUIRED (as well as the complements formed by using them in conjunction with NOT). The purpose of these definitions is to provide the terms with unambiguous meaning so that they may aid in writing specifications.

One area that is often misunderstood is the use of SHOULD-level requirements. This is none too surprising: among the three terms, “should” is the most ambiguous in everyday use. In RFC 2119, “SHOULD (NOT)” means “you may deviate from this and we won’t stop you if you want to hurt yourself, but weigh the consequences.” It is called for when disregard for the requirement would jeopardise interoperability, i.e. it will cause different implementations to fail to communicate with each other sensibly, even as it may not entirely prevent them from it. An example would be labelling the payload of an envelope format in a non-standard way, so that other decoders of the envelope format will nominally be able to parse the envelope, but will see the content as a binary blob which they cannot process further.

The fact that these terms are highly formal should not deter from their casual use when intent warrants it. RFC 2119 is called for wherever there is a formal requirement of some sort. In practical terms, this mostly correlates with whether the given statement spells out a condition that would have to be checked in code that implements the spec. Certainly you want to formally define every aspect of the specification whose implementation you’d excercise with a unit test.

Writing specifications is a weird beast of an activity. It is programming – except not of computers, but programmers. In a sense we might call it meta-programming: a code that instructs programmers about how to write a program. (Remember the original sense of “meta” in metaphysics: “beyond the physics” – where “beyond” prosaically refers to which shelves in the library such books were to be found. But I digress.) Their programming language, obviously, is human language. In human communication, ambiguity and imprecision improves efficiency – and lends itself to poetry besides. Specificiations, however, need to be as precise as possible (including being precise about where they’re not). For better or for worse, poetry is rarely desirable, but unambiguous language is, and that is where RFC 2119 comes in. It is a syntax for specification programming, if you will. Think of RFC 2119 statements as the code, and of informal statements as comments.

That should make it obvious why formal language is always appropriate. Do not shy away from using it, it gives a clear distinction between code and comments and makes the code more precise.