“Real Artists Ship”

Colin Johnson’s blog


int Considered Harmful; or, Are Computer Languages Too General

The flexibility of computer languages is considered to be one of their sources of power. The ability for a computer to do, within limits of tractability and Turing-completeness, anything with data is considered one of the great distinguishing features of computer science. Something that surprises me is that we fell into this very early on in the history of computing; very early programmable computer systems were already using languages that offered enormous flexibility. We didn’t have a multi-decade struggle where we developed various domain-specific languages, and then the invention of Turing-complete generic languages was a key point in the development of computer programming. As-powerful-as-dammit languages were—by accident, or by the fact of languages already building on a strong tradition in mathematical logic etc.—there from the start.

Yet, in practice, programmers don’t use this flexibility.

How often have we written a loop such as for (int i=0;i<t;i++)? Why, given the vast flexibility to put any expression from the language in those three slots, hardly put anything other than a couple of different things in there? I used to feel that I was an amateurish programmer for falling into these clichés all the time—surely, real programmers used the full expressivity of the language, and it was just me with my paucity of imagination that wasn’t doing this.

But, it isn’t. Perhaps, indeed, the clichés are a sign of maturity of thinking, a sign that I have learned some of the patterns of thought that make a mature programmer?

The studies of Roles of Variables put some meat onto these anecdotal bones. Over 99% of variable usages in a set of programs from a textbook were found to be doing just one of around 10 roles. An example of a role is most-wanted holder, where the variable holds the value that is the “best” value found so far, for some problem-specific value of “best”. For example, it might be the current largest in a program that is trying to find the largest number in a list.

There is a decent argument that we should make these sorts of things explicit in programming languages. Rather than saying “int” or “string” in variable declarations we should instead/additionally say “stepper” or “most recent holder”. This would allow additional pragmatic checks to see whether the programmer was using the variable in the way that they think they are.

Perhaps there is a stronger argument though. Is it possible that we might be able to reason about such a restricted language more powerfully than we can a general language? There seems to be a tension between the vast Turing-complete capability of computer languages, and the desire to verify and check properties of programs. Could a subset of a language, where the role-types had much more restricted semantics, allow more powerful reasoning systems? There is a related but distinct argument that I heard a while ago that we should develop reasoning systems that verify properties of Turing-incomplete fragments of programs (I’ll add a reference when I find it, but I think the idea was at very early stages).

Les Hatton says that Software is cursed with unconstrained creativity. We have just about got to a decent understanding of our tools when trends change, and we are forced to learn another toolset—with its own distinctive set of gotchas—all over again. Where would software engineering have got to if we had focused not on developing new languages and paradigms, but on becoming master-level skilled with the already sufficiently expressive languages that already existed? There is a similar flavour here. Are we using languages that allow us to do far more than we ever need to, and subsequently limiting the reasoning and support tools we can provide?

Leave a Reply