Friday, January 25, 2008

Functional Flow Control

In normal programming languages, there are many keywords for flow control, such as for, while etc. These flow control keywords encode a particular pattern of iteration, such as looping over a range (in the case of for) or continuing until some condition holds (while). Imperative programming languages continue to add more iteration keywords: both C#/Java have introduced some form of for-each; Python/C# have yield; Ada has many variants.

Haskell doesn't have these iteration keywords, but instead relies on recursion. This choice, when coupled with a few other Haskell ingredients, makes it much more powerful. Take for example the task of looping over a sequence, adding 1 to each element:


// In C
for (int i = 0; i < n; i++)
list[i]++

-- In Haskell
incList (x:xs) = 1+x : foo xs
incList [] = []


I've used C mutating an array, and Haskell allocating a new list, simply because that would be the natural thing to do in each language. However, the great thing about higher-order functions is that we can now go back and abstract the flow control in Haskell, giving us:


incList = map (+1)


The above function maps over a list, incrementing each element. People have identified a common pattern (iterating over a list) and rather than baking it into the language with a keyword such as iterate-over-list, a library function can provide the operation. It is very important that map is not special in any way, and can simply be defined as:


map f (x:xs) = f x : map f xs
map f [] = []


The great advantage is that rather than being restricted to a limited range of flow control operators that someone somewhere decided upon, we can add new ones. Let's take another example, that of summing a list:


-- in C
int total = 0;
for (int i = 0; i < n; i++)
total += list[i];

-- in Haskell
sum [] = 0
sum (x:xs) = x + sum xs

-- or using the built in foldl
sum = foldl (+) 0


In Haskell there is a standard library function foldl which iterates over a list using an accumulator, managing updates to the accumulator for you, and setting an initial value. In C there is no such operator, so the more general purpose for is used.

But these examples are very common, so C's for keyword has provided most most of the control flow. However, sometimes you need more exotic flow control, which the authors of the language did not think of including. Take the example of computing a fixed point of a function f on the value x:


int x;
while (1) {
int x2 = f(x);
if (x == x2) break;
x = x2;
}

fix f x = if x == x2 then x else fix f x2
where x2 = f x


Here the Haskell version shows its power, instead of having defined a particular instance for a particular f and a particular type of value x, in Haskell we have basically defined fix as a new form of flow control.

In C we were still able to define something, but it was much harder. Now consider the following example that I was working on yesterday. I have an algorithm which has 4 stages, lambda, simplify, inline, specialise. Each stage must be run in turn, but if any stage changes something, then we restart from the beginning. For example, we apply lambda, then simplify - if something changes we restart at lambda. We only finish once all the stages have been run without changing anything. In Haskell this code is simple:


fixList orig x = f orig x
where
f [] x = x
f (a:as) x = if x == x2 then f as x else f orig x2
where x2 = a x

fixList [lambda,simplify,inline,specialise]


I define a new function called fixList, which provides an abstraction of flow control. The actual operations have been well isolated from this structure. I considered how to express this in structured C and drew a complete blank. My best guess is:


int x, x2;

begin:
x = x2;

x2 = lambda(x) ; if (x != x2) goto begin;
x2 = simplify(x) ; if (x != x2) goto begin;
x2 = inline(x) ; if (x != x2) goto begin;
x2 = specialise(x); if (x != x2) goto begin;


It might be possible to change some of this flow control into a macro, but I can think of no clean abstraction. Haskell is a great language for building abstractions, flow-control is just one highly useful instance.

PS: C Issues

Some of the code in this article isn't quite as nice as I'd wanted it to be, and isn't really a fair comparison. The array processing code in C relies on having defined n to be the length of the list, and having that tracked separately. The fixed point definition on C works over int to get a nice equality test, but that is merely a limitation of the language not having a standardized way to do equality. The C code could use function pointers, but in reality inline takes an extra argument so is used as a closure in Haskell - and besides, that would hardly be the standard way of coding C.

13 comments:

  1. Actually, there is a more general way of expressing this algorithm in c.

    int (*a[])(int) functions={lambda, simplify, inline, specialize};
    int i;
    int x,x2;
    for (i=0;i<sizeof functions/sizeof functions[0];) {
    x=x2;
    x2=functions[i](x);
    if (x!=x2) i=0; else i++;
    }

    Freedom of putting arbitrary tail calls is indeed as dangerous, as goto statement, and can produce spagetti code in the same way. So it can't be used as argument for haskell supreme expressive power.

    ReplyDelete
  2. Vlad: As I said in the PS, you can do it in C, but its hardly the standard and recommended way. Also, as I mentioned, inline is actually a closure so your C code wouldn't solve my actual real problem.

    As to comparing tail calls to the goto, thats probably excessive. C has tail calls in some circumstances too! Recursion is a nice principled and structured way of doing flow control, while goto is a fairly horrid way.

    ReplyDelete
  3. Can't agree with that. Having goto-stuffed C code, you can directly translate it into mutually recursive tail calls passing all (at worst) local variables, and inverse translation is also possible in most cases.
    By this reason, functions as a formally isolated blocks of code doesn't provide structural properties by itself, at least in Dijkstra's sense of structural programming. To get structural property, each block is required to have separate meaning, and composition is required to be heirarchical, allowing level-by-level understanding how does program work. But in your example function "f" obviously doesn't have such meaning: you even didn't bother to give it a substantial name (and probably it's impossible, because "f" is directly related to a particular task, specifically using outer name "orig"). And in general, ways of composition are not hierarchical too: you can call arbitrary function from arbitrary one. There is no more meaning in this function, than in a fragment between label and closest goto statement in an imperative code. Introducing this functions seems to me as dirty as introducing label in procedural program.

    My point is that comparing with C, of course haskell have additional ways of expressing (functions as first-order objects and laziness provide more powerful ways of decomposition), but recursion regarded as flow control feature is not among them.

    ReplyDelete
  4. Anonymous6:32 PM

    Recursion is definitely a flow control feature.

    ReplyDelete
  5. Anonymous8:03 PM

    This is a complete tangent, but... You don't need to use goto. Most C programmers probably wouldn't, I think...

    More likely on a more interesting data type, the function would mutate the structure in-place and return a bool (er, int) indicating whether or not they changed anything, making it:

    while (true)
    {
    if (lambda(x)) continue;
    if (simplify(x)) continue;
    if (inline(x)) continue;
    if (specialise(x)) continue;
    break;
    }

    Which gives you relatively clean (if more verbose than strictly necessary) code at the cost of less generic pieces. (Maybe this is less tangential than I thought...)

    ReplyDelete
  6. I think a more interesting comparison might be between Haskell and eagerly evaluated functional languages. In Haskell, you can directly define functions that do control-flow-like things, like the fixpoint operator, folds, and whatever you please. In Scheme, if you want to control the order of evaluation, you need to resort to macros, which are alright, but it's more code to write. I imagine the same goes for ML and the like.

    ReplyDelete
  7. If you typically have a source time specified sequence of functions to fix, I would recommend a combinator approach rather than a specialized function to fix a list of functions. Using combinators has the advantage that it will much more likely be be amenable to compiler optimizations such as inlining and contification.

    As an added bonus, it turns out that the solution will be simpler that way. You can still use the same fix function. All you need is a combinator that takes a pair of functions and pipes the result through both of them only if the first function does not change the result. Here is how the code could look like in Standard ML:

    (* The combinators: *)
    fun fix f x =
    (**)case f x
    (***)of fx =>
    (******)if fx = x
    (******)then x
    (******)else fix f fx

    infix >>

    fun (f >> g) x =
    (**)case f x
    (***)of fx =>
    (******)if fx = x
    (******)then g fx
    (******)else fx

    (* And an example: *)
    fun mk name n x =
    (**)(print (concat
    (**********)[name, ": ",
    (***********)Int.toString x,
    (***********)"\n"]) ;
    (***)if x < n
    (***)then n
    (***)else x)

    val lambda = mk "lambda" 1
    val simplify = mk "simplify" 2
    val inline = mk "inline" 3
    val specialize = mk "specialize" 4

    val 4 =
    (**)fix (lambda >>
    (*******)simplify >>
    (*******)inline >>
    (*******)specialize)
    (******)0

    Apparently this blog software has not been designed by a hacker.

    ReplyDelete
  8. Anon: while/continue is a much better way of writing it - I just didn't think of it.

    vesa: Yes, my original version did use combinators in a similar manner to what you suggest - but at some point I changed, although I'm not entirely sure why. Your combinators also have the advantage that if one stage is idempotent you can define:

    f !>> g x = g (f x)

    And still use the same structure as before. In this case, the action functions are very large so there won't be any noticable benefit from inlining etc.

    And yes - I do with the blog did code much better - pre tags might help, its how I have to do the code in the posts.

    ReplyDelete
  9. The idea of trying pre tags did not escape me. The blog complains: "Your HTML cannot be accepted: Tag is not allowed: <pre>".

    ReplyDelete
  10. It doesn't look to me like you've said anything much about recursion versus iteration - all your examples relate much more to Haskell's first-class functions.

    rkleckner: you can write fold in Scheme without macros :-) You do need macros for some other control-flow stuff, though.

    ReplyDelete
  11. Anonymous12:17 PM

    Hi, can you tell me how to go about this kind of problem with recursion:

    Write a defnition in Haskell, when passed a non-negative integer n, returns 0 + 1 + 2 + · · · + n.

    This was on an exam I did but skipped the question.

    ReplyDelete
  12. Anon: That's a fairly introductory question, with a large number of answers, some of which are covered on the Wikipedia page for Haskell. I'd recommend asking either your lecturer, or the haskell-cafe mailing list. You need both an answer, and an explanation of the answer.

    ReplyDelete
  13. while (
    ____lambda_____(x)_||
    ____simplify___(x)_||
    ____inline_____(x)_||
    ____specialise_(x)
    ){}

    I know it's pointless :)

    ReplyDelete