Monday, July 21, 2014

Why is the Fundamental Theorem of Software Engineering Fundamental?

Have you heard of the adage "All problems in computer science can be solved by another level of indirection."? If you are a programmer, chances are you have read about it in a book or an article talking about how to best structure the software you write. It's been dubbed the fundamental theorem of software engineering (the FTSE) so you should know what it is about. If you don't then, quickly, go read up on the subject at ... no, I won't do the googling for you.

A common explanation you will find is that the FTSE is talking about abstraction layers: another level of indirection is achieved by raising the abstraction level. That's a good way to view it, but only some of it. Abstraction, when realized in software, often results in a layer so that the details of whatever is on the other side remain hidden, and a layer causes references to go in round about ways to get to the point, hence indirection. However, there are other forms of indirection where one just wants to reduce coupling between two parts. Here we are not really getting rid of any details, but maybe creating a common language about them. Think of the Adaptor pattern for example. Not convinced? Go read the insightful commentary on the subject by Zed Shaw at http://zedshaw.com/essays/indirection_is_not_abstraction.html.

So, just for the fun of it, how would you explain the FTSE to a neophyte? Wikipedia defines indirection as the ability to reference something using a name, reference, or container instead of the value itself. This is both misleading and revelatory. Misleading because this is a much more general definition than the intuition most programmers have about the FTSE. After all, everything relies on naming and references, so what could a theorem stating the obvious have to teach us? But it's also revelatory because it hints at the fact that much of software engineering, and therefore much of computing, is about answering the equally famous question what's in a name. Therefore, to apply the FTSE in practice we need to allow ourselves to answer that question the best we can at any point in time. That is, we need to be able to define the meaning of a name within any given context. The meaning of a symbol in a software system of even moderate complexity quickly starts exhibiting nuances and goes through changes much like words in natural language. This is because just like natural language is a complicated symbolic model of the messy world, a software system can be similarly characterized.

In a sense software, any software, is a model of the world. The symbols we use inside a software program acquire meaning (or semantics if you will) by virtue of what the entities they refer to actually do. A function's meaning is its implementation. The meaning of a piece of data is how it's used, how it interacts with other data. And programming types govern that usage to some extent, but mostly it is about what we do with the data inside the program. For instance, to discover the meaning of a field in a data structure about which you have no documentation, you would trace where and how it is being used. New requirements or changes and adjustments to old requirements are nothing more than a refined model of the world and/or system that needs to be reflected in software. This process of refining a model is enriching the meaning of the terms used to describe things and that translates into modifying the semantics of certain symbols in the software. Which is what we call "changing the implementation". Now, the practice of programming in not seen as creating and refining meaning of symbols, but I believe that is a very important perspective. It is a perspective where the resolution of symbolic references in context is at the foundation of programming, instead of the usual "giving instructions to a machine" view. I came to this conclusion about 15 years ago while designing a programming language and looking at various theoretical constructs in PL design, their limitations and trade-offs. Over the years, I have developed a reflex to see a lot of design problems in terms the important symbols in play, the context, both lexical and runtime (or static and dynamic if you prefer), and what is the resolution process of those symbolic references. I also see a lot of programming language constructs as being in essence a bag of reference resolution tools. That is, programming constructs are in large part tools a programmer has at their disposal to define the meaning of symbolic references. Let's take a look at some.

Variables


Variables are the quintessential programming tool. Probably the first construct you ever learn in programming, directly lifted from algebra, it is the simplest form of abstraction - use a name instead of a value. Of course, as the term "variable" suggests, the whole point is for the thing itself to change, something that is able to vary. So in effect a variable both establishes an identity and an interface for changes to be affected, thus bypassing all sorts of metaphysical problems in one nice engineering sweep. Conceptually variables are direct symbolic references, associations between a name and a value "container". Implementation wise, they are usually the same thing as pointers to memory locations and that's how they've always been understood. In fact, this understanding is a consequence of the fact that in compiled languages the name of a variable disappears from the compiled version, it is replaced by the memory location. A benefit of this strategy is that the cost of using a variable is as low as it can be - just a RAM memory cell access. On the other hand, any flexibility in how the name is to be interpreted at runtime is completely gone.

Note that we are not talking here about the distinction between "reference variables" vs. "primitive data variables" or whatever. Though that's an important distinction, what we are concerned about is merely the fact that variables are names of things. What is thought of as "reference variables" (or pointers) in various languages has to do with how they are copied during an assignment operation or as a function argument, whether the value is mutable or not etc.

Aliases and Macros

Aliases are relatively uncommon as a separate construct in modern languages. When pointers are assigned to each other, this is called aliasing because we have two names for the same memory location, i.e. two references with the same referent. For example, the assignment of any variable of Object type in Java is considered aliasing. While we do have another level of indirection here since we could change one reference without disturbing the other, this type of aliasing is not particularly interesting. But consider another type of aliasing, through macros (think C #define) where a name is explicitly declared to be a replacement of another name. The indirection here involves an extra translation step and the meaning of the alias in this case is not that it has the same referent, but that its referent is the original name. As a consequence, mentioning the alias at a place in the program where the same symbol is used for an entirely different thing will make the alias refer that that thing. Another benefit of this kind of aliasing is that it can be used to refer to other things besides variables, for example types, especially when a type description is a complex expression that we don't want to write over and over again. Macros are also in the same spirit - languages that recognize the value of compile-time decision making will offer a macro facility.  And it is a shame they are not more popular. They just have a bad aura because they are associated with purely textual manipulation, a completely separate language on its own. However, if one sees them as a way to do structured code generation, or compile-time evaluation, they are much more appealing. One has to realize that macros and aliases are about name & conquer just as much as variables and functions are, and that they are in fact a great level of indirection mechanism. A mechanism that occupies another spot in the compile-run time dimension. Speaking of which, the fact that there is nothing in between that strict compile vs. run-time separation is strong limitation in language expressiveness. Partial evaluation techniques could be what macros at run time look like, but those are still confined to academic research mainly.

To sum up so far: the key difference between variables and aliases is the timing of the reference resolution process. With variables, the referent is obtained when the program is running while with aliases it is obtained at compile time. In one case we have the context of a running program, in the other the context of a running compiler.

Overloading


Overloading is a programming mechanism that allows one to define different implementations of the same function depending on the arguments used during invocation. In other words, overloading lets you associate a different meaning with a given name depending on the syntactical context of usage of that name. It's a context-dependent reference resolution process that happens generally at compile time, hence within a static context. A rough natural language analogue would be homonyms that have different meanings only because used as a different part of speech. For example, in "all rivers flow" and "the flow is smooth" the semantic import is the same, but the strict meaning is different because in one case we have  a verb while in the other we have a subject. A variation on the theme is Common Lisp and its generic functions where the dispatching can be defined on an actual object, via an equals predicate. In that case the context for the resolution is a dynamic one, hence it has to do more with semantics in a sense. That's more like homonymy where the semantics of a word depend on the semantics of the surrounding words.

Overriding

Overriding is about changing the meaning of a name in a nested lexical scope. I deliberately use the word meaning here to talk about whatever a name refers to, understanding that often the meaning of a symbol in a programming language is simply the referent of that symbol. A nested lexical scope could be a nested function, or a class declaration or some code block the delimits a lexical scope. In some cases, one looses the ability to access the referent from the enclosing scope. In others, the programming language provides a special syntax for that. (i.e. the super keyword in Java).  Again, we are talking about a reference resolution mechanism in a certain specialized context. The context is completely specified by the lexical scope and is therefore static. Note that people somewhat erroneously use the word overwrite instead of override. The correct term in Computer Science is override. In English, it means to cancel a previous decision whereas overwrite literally mean to write over something. More on the mechanics of overriding below.

Classes 

A compound structure such as an object in JavaScript, or a class in Java/C# or a struct in C is, among other things, a context where certain names have meanings. Specifically, the fields and methods that belong to that structure are references that are resolved precisely in the context of the runtime object. Well, actually it depends. An interesting case are the static variables in Java classes. A static variable is a member of the class object rather than of its instances. One way teachers of the language describe static variables is that all objects of that class share the same variable. That's even how the official Java documentation talks about static variables: in terms of variables that all objects share (see http://docs.oracle.com/javase/tutorial/java/javaOO/classvars.html). But that is inaccurate because an object (i.e. an instance of the class) is not the proper context to refer to the static variable. If we have:

class A { public static int a = 0; }
class B extends A { public static int a  = 1;}
A x = new B();
System.out.println(x.a); // what are we supposed to see here?

What does the last statement print out? That sounds like a good entry level job interview question. The question comes down to what is the proper context for the reference resolution of the name a? If we see static variables as variable shared by all objects of the same class, the value should clearly be 1 since x's actual type is B and B.a is what all objects of type B share. But that is not what happens. We get 0 because Java only cares about the declared type which in this is A. The correct way to talk about static variables is as member variables of the object class (in Java, every class is an object at runtime, an object whose type is java.lang.Class). This is why some recent Java compilers issue a warning when you refer to a static variable from an object context (though not the JDK7 compiler!) To be fair to official documentation, the above mentioned tutorial does recommend use of the class rather than the object to refer to static variables. However, the reason given is that otherwise it does not make it clear that they are class variables. Now, if we had non-static member variables? We get the same result: the declared type of the object variable x is what matters, not the actual runtime type. If instead of accessing a public variable, we were accessing  a public function then the runtime type would have been the one used to resolve the reference.

So why is that the case? Because part of the reference resolution process happens at compile time rather than at run-time. Traditionally, in programming languages a name is translated to a memory location during compilation so then at runtime only the location matters and the referent is obtained the fastest possible way. With so called "virtual methods", like non-static methods in Java, we get to do an extra hop to get to the memory location at runtime. So for variables, both static and non-static, and for static methods, the reference resolution is entirely based on the static context (type annotations available at compile time) while for non-static function it becomes dynamic. Why is only one kind of name inside a class's lexical context resolved in this way. No deep reason, just how Java was designed. Of course I could have just said "virtual tables only apply to non-static functions", but that's not the point. The point is that in defining programming constructs, an important part of the semantics of a programming language is based on narrowing down what is the context and the process for reference resolution in all the various ways a symbol can be mentioned in a program. For most mainstream languages, this only applies to identifiers, a special lexical category, but it is in fact more general (e.g. operators in C++ or Prolog, any type of symbol in Scheme).  A common name for this kind of deferring of the reference resolution is late binding. And everybody likes it because "it's more flexible". Good.

Closures

To finish, let me mention closures, a term and a construct that has thankfully made it into mainstream programming recently. The power of closures stems from their ability to capture a computational context as a reference resolution frame. It is a way to transcend the rigid lexical scoping somewhat and have a piece of code rely on a set of variables that are hidden from the users of that code, yet whose lifetime is independent of its execution. So the variables captured in a closure behave like global variables in terms of their lifetime, but like local variables in terms of their scope. So they represent a very particular intersection between the design dimensions of visibility&lifetime. But what that does in effect, to put it in more general terms, is that the meaning of a name is carried over from one context to another without it interfering with meanings of that same name in other contexts.

Okay. So we started with a software engineering principle and we dug somewhat into a few programming language concepts from the perspective of reference resolution. Naming and reference are philosophical problems that are probably going to get resolved/dissolved soon by neuroscience. In the meantime, the cognitive phenomenon and whatever philosophy and linguistics has thought us so far about it could serve a bit more as an inspiration to how people communicate with machines through programming. So you can see where I'm heading with the question posed at the title of this post. It is the reference resolution that is fundamental an indirection is simply what we do to (re)define what that resolution process ultimately looks like. I have more to say about it, but this already got a bit long so I'll stop here.