Alien Languages

The recent movie Arrival treats an imagined arrival on earth by alien beings. The United States government, at a loss to understand the visitors’ intentions, conscripts the film’s hero—unusually for Hollywood, a linguist—to help understand the aliens’ language, and in turn, their purpose.

arrival-e1504797937710.jpeg

The aliens’ language’s “freedom from time” evokes the functional programming language Haskell.

The linguist, Louise Banks, soon makes headway. She discovers that the aliens’ language “has no forward or backward direction” and “is free of time”. Moreover, in a nod to the (unfortunately, all-but discredited) Sapir–Whorf hypothesis—according to which, as Banks suggests, “the language you speak determines how you think and… affects how you see everything”—Banks soon finds her own cognition shifting:

If you learn it, when you really learn it, you begin to perceive time the way that they do, so you can see what’s to come. But time, it isn’t the same for them. It’s non-linear.

Far from inducing an reaction of incredulity and awe, these descriptions of the visitors’ language provoked in me just one persistent response: “This is just like the programming language Haskell.”

Haskell is a computer programming language in the functional programming paradigm; as such it differs deeply in nature from the more standard imperative programming languages like Java and C++. Haskell, fascinatingly, provides concretely on earth what the move Arrival can only gesture at in worlds beyond: a language free of conceptions of order and time, one that proceeds “non-linearly” and even seems to violate standard principles of causality.

This summer I wrote a chess engine in Haskell. Chess appears in Arrival, in fact: Louise mentions the prospect of communicating with the aliens through chess—“every idea” would be “expressed through opposition, victory, defeat,” she points out (with due trepidation). For my part, I contend against far less interesting rivals: the denizens of the Free Internet Chess Server, where my engine, packaged into a server-capable robot (also written in Haskell), boasts ratings of 1600 in blitz and 2000 in lightning.

My focus in this post, however, is not my chess program, but rather the underlying language, whose hold on me was only intensified by the parallels I observed in Arrival.

To begin exploring these parallels, let’s briefly review imperative and functional languages. An imperative program, essentially, provides a list of operations to be executed sequentially and in order. This order—or control flowcan, of course, be interrupted, branched, or re-routed through various instructions in the program; at the end of the day, though, an imperative program reads like a cursor chugging downwards through a list of instructions. The program, meanwhile, maintains a memory of values and variables, which are themselves read, manipulated, and rewritten throughout the course of the program. The value of one or more of these variables upon the program’s termination is often the value desired by the programmer.

functional program, on the other hand, specifies a network of data dependencies, in which an “outermost” expression, whose value we’d ultimately like to compute, invokes the help of further expressions, which invoke the help of further expressions, and so on. This network of expressions has no inherent order, and indeed which expressions are invoked will depend on the inputs fed into the program; moreover, it’s not hierarchical, as dependencies in the network can be self-referential. There is no external store of mutable variables; rather, values (which might themselves be complex unevaluated expressions) are continually passed “inward” as parameters (evaluated when necessary) and propagated “outward” as functions’ evaluation or return values.

To see what this means, let’s look at an example. I’ll describe, in both imperative and functional styles, a basic routine that takes a chess position and yields a single number describing the “score”, or evaluation, of that position—with the caveat that if the position is illegal, we’ll instead assign a mating penalty.

Here’s the imperative version:

float eval(int depth, Game game):
   if (depth == 0): return heuristic(game)
   Game[] children = {--code that generates pseudo-children--}
   boolean legal = {--legality test, which invokes children--}
   if (not legal): return mating_penalty(game.player)
   for each child in children:
       child.eval = eval(depth-1, child) //this is expensive!!!
   float best = find_extremum(game.player, children)
   return best

The execution of this routine proceeds from the top to the bottom. We first check a recursive stop condition; if the position is a leaf node, we return immediately after a call to the constant-time heuristic function. Barring this case, we first generate the position (game)’s pseudo-legal children and then check for legality (namely, for whether the parent who created game left his king in check); as this latter step amounts to checking whether any of these pseudo-legal moves captures the opponent’s king, we see that the order of these two steps cannot be reversed. Only at this point may we, if we do discover illegality, cut the process short. If we do not, then we then undertake the resource-intensive step of recursively evaluating game‘s children, before finally choosing the best child (i.e. the highest or lowest, depending on who is to move) and returning its eval.

(Even here, I’ve been generous to the imperative paradigm. In practice, the code segments supplying the values of children and legal would probably not consist of single assignments, but rather of larger blocks consisting of assignments followed by various imperative manipulations, probably involving loops.)

Though this standard exercise in the theory of sequential games may seem a bit daunting, the important thing, I contend, is to observe how time-dependent my summary was. Notice, in particular, the prevalence of words like firstnext, and then.

Here is the functional version:

eval depth game = if depth == 0
                     then heuristic game
                     else if not legal
                             then mating_penalty (player game)
                             else best
   where
      best      = find_extremum (player game) populated
      populated = map (eval $ depth-1) children //EXPENSIVE!
      legal     = {--a boolean expression, involving children--}
      children  = {--expression for a list of pseudo-children--}

This fascinating bit of pseudo-code also demands explanation. Note first that there are no return statements (which terminate the program’s control flow); meanwhile, expensive expressions, like that defining populated, simply sit in scope without qualification. How exactly, in this case, do we preserve the advantages of the previous code, in which, for instance, no children were recursively evaluated in the case of an illegal position?

These advantages, sure enough, persist all the same here, though in various functional guises. Indeed, the four lines of code under the where keyword are better thought of as (static, permanent) definitions than as commands to be executed; moreover, each keyword’s right-hand side is only evaluated when it is needed by a higher-level expression. In fact, the order of these four could just as well be scrambled, without affecting the program (in Haskell, any set of definitions sharing the same scope may be re-ordered freely!) (I’ll also note that the same effect could be achieved without using where; the keyword serves only to facilitate the aesthetically convenient option of extracting large expressions from the contexts in which they are used. The expressions could have been written inline instead.)

When the top-level routine eval is invoked (strictly say, for pedants) the only data requested at first is the value of depth. In the case that this is 0, a call to the heuristic function provides all the information that eval needs, and its job is considered done.

I’ll pause here to stress that the if then else expression in Haskell is not a routing of control flow, as in the imperative case. (Note that underneath the then and else headings reside expressions, and not statements.) It’s best thought of as a ternary function which takes a Boolean value and two values of type a and returns an a (in our case a is a floating-point real number), returning the first a if the Boolean evaluates to True and the second if it evaluates to False. (It’s essentially a multiplexer.) Of course, this function is lazy in its arguments; crucially, assuming the (say) second argument is not a literal but rather a complex expression that (eventually) evaluates to an a, this expression will only be evaluated if the initial Boolean expression turns out to be false.

Let’s assume now that this indeed happens. If depth is greater than 0, we begin evaluating the expression under the else branch. This expression introduces yet another if then else statement, whose boolean expression now is the value legal. Now, and only now, must we dip into the where tray and find legal‘s true value; this in turn, requires evaluating children.

If, finally, legal turns out to be True, we pass forth the floating point number best. Knowing its value, of course, requires recursively populating the elements of children, and the heavy cost of populated is duly incurred.

Note, here, that my characterization of this program’s operation did not invoke concepts like time and order, and in any case the order of the lines on the page had nothing to do with the conceptual structure of the program. Indeed, a compiled Haskell program is not so much a list of instructions as a graph of invocations or dependences, static and free of time. Our beloved Haskell creators have called it the “Spineless Tagless G-Machine“.

Things will start to get interesting when we encounter loops in this graph.

*                *                *

“Memory is a strange thing,” Louise muses, during a voiceover early on in the film. “It doesn’t work like I thought it did. We are so bound by time, by its order.”

In the concluding scenes of Arrival, Louise, together with her partner, theoretical physicist Ian Donnelly, begin to experience perplexing time-related phenomena. (I’ll try to avoid spoilers here.) Louise, for example, experiences what appears to be a time-like loop, in which events she had taken to be in her past begin to recur anew; she also experiences a sort of “reverse causality”, whereby an event in the “future” crucially informs her conduct at an “earlier” point in her life.

The point, of course, is that Louise and Ian’s knowledge of the aliens’ language has augmented their very perception of reality, allowing them to perceive time in its full complexity, complete with loops and non-linearity. The Sapir-Whorf hypothesis ends up bearing fruit.

Haskell, too, will deliver on its promises, with bizarre self-referential behavior that will defy everything we thought we knew about programming.

Take, for example, the following definition, lifted from A Gentle Introduction to Haskell:

ones             = 1 : ones

By way of explanation, the : operator in Haskell appends the element (of type a) on its left-hand side to the head of the list (of type [a]) on its right.

I couldn’t believe my eyes when I first saw this definition. From another point of view, however, true competency in Haskell will be reflected when we can convince ourselves that this is not weird.

The first step is to point out that this is not an assignment, as a habituation to imperative languages might lead one to assume. (In an imperative assignment, the compiler must fully evaluate the right-hand value before binding it to the left-hand variable; the variable ones, in this case, would of course be undefined at the time of its lookup.) It’s rather a definition, and the fact that it’s self-referential turns out to be inconsequential: Haskell doesn’t care about the shape of our program’s dependency graph.

Indeed, booting up the GHCI REPL environment, we observe the expected behavior:

take 5 ones => [1,1,1,1,1]

On a similar vein, we have the following self-referential definition:

numsFrom n       = n : numsFrom (n+1)

and the expected

take 5 (numsFrom 12) => [12,13,14,15,16]

Is this recursion we’re seeing? Not quite. In the first case, for one, ones is not even a function, but rather a (nullary) value, so speaking of traditional recursion doesn’t even make sense here. (In Haskell, function definitions differ from value definitions only in that the latter take no arguments.) The function numsFrom likewise, should not be viewed as a recursive function that returns a list after calling an instance of itself. This “recursion” would have no base case, and computing the “value” of numsFrom 12 in an imperative setting would lead to an infinite regress and stack overflow.

The best way to conceptualize these definitions is as static definitions in which data dependencies exist, and in which these dependencies happen to be self-referential. (In a sense this was already true in the chess example above, in which the value of eval was itself defined in terms of eval. This was not quite as striking, though, as the latter instance was invoked with different parameter values, and furthermore in such a way that the base case was always safely in sight.) This is the key to laziness: when take 5 is called with one of these fellows as its argument, the argument is not “evaluated” in any sense, but merely probed until it yields the values we seek. You can even trace out the paths of the call-by-name and substitution processes.

I’ll end, here, with the most mind-boggling case of them all:

fib              = 1 : 1 : [ a+b | (a,b) <- zip fib (tail fib) ]

This expression’s value is an infinite array containing the Fibonacci numbers.

The right-most list is an example of a list comprehension, and is inspired by mathematical notation; it can be read as “the (ordered) list of elements a + b yielded as (a,b) ranges throughout the list zip fib (tail fib)“. The function tail is a sort of opposite of :, which drops the first element of a list and returns the rest. zip, finally, takes two lists and returns a list of interleaved tuples, so that

zip [1, 2, 3] ['a','b','c'] => [(1,'a'),(2,'b'),(3,'c')]

What makes this case strange is that the arguments of zip are none other than fib itself and tail fib, an immediate derivate. By interleaving these two lists—which we’re in the process of defining—and adding them up elementwise, we assemble the Fibonacci sequence.

Sure, enough, we have:

take 5 fib => [1,1,2,3,5]

This works just about the same as in the previous two cases, if you trace things through. I’ve sketched a schematic outline of this tracing process in the diagram below:

fib.png

Haskell can represent the Fibonacci numbers as an infinite lazy sequence.

If you can understand this, then you’re in good shape.

There’s just one more minuscule, tiny detail that I’ve left out. This, in fact, will change everything. The Gentle Introduction casually mentions that this approach calculates the Fibonacci numbers efficiently (i.e., the duration of the computation is linearly proportional to the amount of terms in the sequence requested). This is certainly not true of the implementation I have sketched, which exhibits heavy branching behavior, leading to exponentially growing complexity as the number of terms we want grows. What exactly is going on?

We seem to have encountered some sort of reverse causality. With each successive term we uncover, we get more information about the list fib. We should get this information, however, only after first unravelling definitions in an exponential, branching manner and then laboriously combining the results. The linear-time behavior promised in the Introduction, then, could only possibly occur if some communication were occurring in the reverse direction, namely from the settled results of past calculations, at the root of the tree, to the ongoing labor of evaluation and combination, far away at the leaves.

To understand fib in our previous model, we followed branching recursive references which pointed back to fib. It would seem, now, that the targets to which these references point do not remain as stated in our definition, but are themselves refreshed as the information gradually materializes. This would seem to require that the caller of a function know its value (at least partially) before calling it. How this happens I simply do not know.

*                *                *

“What do they want? Where are they from? Why are they here? This is the priority,” barks Colonel Weber, a US Armed Forces official, to the physicist, Ian Donnelly, as they walk into the depths of the operating base.

“Have they responded to anything? Shapes, patterns, numbers, fibonacci?” Donnelly asks.

It would indeed be interesting to find aliens who could recognize the Fibonacci sequence. Yet only an advanced alien species whose language truly existed outside of time, I contend, could understand the Fibonacci numbers as realized as an infinite lazy sequence in purely functional Haskell.

Advertisements

2 comments on “Alien Languages

  1. Ben says:

    I’ll leave here one vague, speculative comment. During my undergrad CS years, I took a few courses in “Digital Logic”. The primary exercise in these courses, especially in the more advanced ones, was to use a language called Verilog (VHDL) to create an entire computer chip from scratch. This “virtual” chip was later realized physically in a field-programmable gate array, an interesting device which “understood” Verilog and assembled physically the chip that the language described only virtually.

    This sort of “description” is the point of this comment. As our professor, the legendary and universally beloved Montek Singh, repeatedly insisted to us, Verilog is not a “programming language” but rather a “hardware description language”, the word description being operative. This language, indeed, facilitated the description of a computer chip, itself a static and permanent arrangement of circuits and wires. “Put an and gate here, a wire here, and an or gate here,” the language might go. And so on.

    We quickly learned that order didn’t matter in this language. If I put an and gate on the table, and then an or gate, and finally connect the two with an appropriate sequence of wires, the same effect would be achieved if I had put, say, the wire down first and then attached gates to both sides. The only thing that mattered was, at the end of the day, the sum total of connections and how they were arranged.

    My point here is that Haskell, to me, resembles this hardware description language (or the act of designing a computer processor) much more than it does traditional programming languages (or the operation of the computer processor). (It’s not a coincidence that I compared Haskell’s if then else statements to multiplexers.)

    Haskell, as this post attempts to explain, facilitates the description of a static arrangement of functions and data dependencies, much in the same way that Verilog facilitates the description of a static arrangement of circuits and wires. A bare CPU, meanwhile, is a highly imperative device, sporting an array of “registers”, which hold temporary and mutable values, and “instructions”, imperative procedures which manipulate, act on, and rewrite the values of these registers in sequence.

    The interesting, or “philosophical”, question here is, why does Haskell resemble designing a computer more than it does using a computer? Indeed, however much Haskell, in the words of Fred Brooks, builds “castles in the air”, these are ultimately compiled, by GHC, into binaries which run on imperative architectures. In this respect, the premise of Haskell seems less natural. Might as well stay close to the architecture.

    The answer to this question is surely deep, and the vast Haskell community could do a better job of answering it than I. I’ll offer a few suggestions. For one, the laziness of Haskell actually facilitates many optimizations that would be painful to implement in imperative languages. For example:
    (1) There’s a fairly straight-forward computer chess optimization whereby the move-generation phase is interspersed with the alpha-beta pruning phase. Under this paradigm, a child is visited recursively the instant it is created; the benefit, here, is that if a child induces an alpha-beta cutoff, not just the visitation but even the generation of the remaining children can be aborted. Fascinatingly, this optimization is implemented automatically by Haskell’s laziness, as the massive expression describing a node’s children is stored unevaluated, and is broken down only piece-by-piece as demanded by the alpha-beta algorithm. When the latter terminates, the former is discarded.
    (2) An even subtler imperative optimization—this time, of space, as opposed to time—can be achieved by interleaving the move-generation/visitation and find-max procedures. Under this framework, each child, following its creation and visitation, is immediately compared to the parent’s running best; if it beats it, it replaces it, and otherwise it is discarded. This contrasts with the naive implementation, under which all children are first generated and visited before their max is found. The saving here, of course, is that only one child needs to be stored in memory at any given time (per visitation), instead of an amount proportional to chess’s branching factor (around 35). (This optimization is still valuable even if (1) is not implemented, as, even in this case, only one child with a principal variation (or game tree) attached must be stored at any one time.) Once again, this optimization follows naturally from Haskell’s laziness, in particular that of the “find best” operation, which, libraries notwithstanding, admits a natural construction from scratch.
    These phenomenona are similar to that exhibited in the isPrime example here.

    Beyond examples like these, GHC supports a vast array of ingenious optimizations, a few of which are detailed in this excellent stackoverflow answer. GHC’s strength has been described as “industrial”, and for good reason. Haskell’s purity, of course, facilitates many of these optimizations, as, absent side-effects, the results of function calls with identical parameters can be re-used.

    My short answer, then, is that the strength of Haskell’s language paradigm is such as to begin to outweigh the drawbacks of veering far from the CPU’s underlying imperative architecture. Beyond this, though, I can’t say much. I simply don’t understand enough about how GHC works.

  2. Ben says:

    One final comment. I still, truly, don’t understand the fibonacci example detailed above. But whatever is going on, I have a feeling it has something to do with sharing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s