Monday, November 09, 2020

Turing Incomplete Languages

Summary: Some languages ban recursion to ensure programs "terminate". That's technically true, but usually irrelevant.

In my career there have been three instances where I've worked on a programming language that went through the evolution:

  1. Ban recursion and unbounded loops. Proclaim the language is "Turing incomplete" and that all programs terminate.
  2. Declare that Turing incomplete programs are simpler. Have non-technical people conflate terminate quickly with terminate eventually.
  3. Realise lacking recursion makes things incredibly clunky to express, turning simple problems into brain teasers.
  4. Add recursion.
  5. Realise that the everything is better.

Before I left university, this process would have sounded ridiculous. In fact, even after these steps happened twice I was convinced it was the kind of thing that would never happen again. Now I've got three instances, it seems worth writing a blog post so for case number four I have something to refer to.

A language without recursion or unbounded loops

First, let's consider a small simple statement-orientated first-order programming language. How might we write a non-terminating program? There are two easy ways. Firstly, write a loop - while (true) {}. Second, write recursion, void f() { f() }. We can ban both of those, leaving only bounded iteration of the form for x in xs { .. } or similar. Now the language is Turing incomplete and all programs terminate.

The lack of recursion makes programs harder to write, but we can always use an explicit stack with unbounded loops.

The lack of unbounded loops isn't a problem provided we have an upper bound on how many steps our program might take. For example, we know QuickSort has worst-case complexity O(n^2), so if we can write for x in range(0, n^2) { .. } then we'll have enough steps in our program such that we never reach the bound.

But what if our programming language doesn't even provide a range function? We can synthesise it by realising that in a linear amount of code we can produce exponentially large values. As an example:

double xs = xs ++ xs -- Double the length of a list
repeated x = double (double (double (double (double (double (double (double (double (double [x])))))))))

The function repeated 1 makes 10 calls to double, and creates a list of length 2^10 (1024). A mere 263 more calls to double and we'll have a list long enough to contain each atom in the universe. With some tweaks we can cause doubling to stop at a given bound, and generate numbers in sequence, giving us range to any bound we pick.

We now have a menu of three techniques that lets us write almost any program we want to do so:

  1. We can encoding recursion using an explicit stack.
  2. We can change unbounded loops into loops with a conservative upper bound.
  3. We can generate structures of exponential size with a linear amount of code.

The consequences

Firstly, we still don't have a Turing complete language. The code will terminate. But there is no guarantee on how long it will take to terminate. Programs that take a million years to finish technically terminate, but probably can't be run on an actual computer. For most of the domains I've seen Turing incompleteness raised, a runtime of seconds would be desirable. Turing incompleteness doesn't help at all.

Secondly, after encoding the program in a tortured mess of logic puzzles, the code is much harder to read. While there are three general purpose techniques to encode the logic, there are usually other considerations that cause each instance to be solved differently. I've written tree traversals, sorts and parsers in such restricted languages - the result is always a lot of comments and at least one level of unnecessary indirection.

Finally, code written in such a complex style often performs significantly worse. Consider QuickSort - the standard implementation takes O(n^2) time worst case, but O(n log n) time average case, and O(log n) space (for the stack). If you take the approach of building an O(n^2) list before you start to encode a while loop, you end up with O(n^2) space and time. Moreover, while in normal QuickSort the time complexity is counting the number of cheap comparisons, in an encoded version the time complexity relates to allocations, which can be much more expensive as a constant factor.

The solution

Most languages with the standard complement of if/for etc which are Turing incomplete do not gain any benefit from this restriction. One exception is in domains where you are proving properties or doing analysis, as two examples:

  1. Dependently typed languages such as Idris, which typically have much more sophisticated termination checkers than just banning recursion and unbounded loops.
  2. Resource bounded languages such as Hume, which allow better analysis and implementation techniques by restricting how expressive the language is.

Such languages tend to be a rarity in industry. In all the Turing incomplete programming languages I've experienced, recursion was later added, programs were simplified, and programming in the language became easier.

While most languages I've worked on made this evolution in private, one language, DAML from Digital Asset, did so in public. In 2016 they wrote:

DAML was intentionally designed not to be Turing-complete. While Turing-complete languages can model any business domain, what they gain in flexibility they lose in analysability.

Whereas in 2020 their user manual says:

If there is no explicit iterator, you can use recursion. Let’s try to write a function that reverses a list, for example.

Note that while I used to work at Digital Asset, these posts both predate and postdate my time there.

18 comments:

Clément said...

This is a great post, but there are a few inaccurate things here and there:

> Secondly, after encoding the program in a tortured mess of logic puzzles, the code is much harder to read.

It's true in many of the non-turing-complete languages, but not in, say, Dafny, F*, Lean, Coq, Agda, Idris, etc… (and not all of these are dependently typed). You do mention Idris later.

> If you take the approach of building an O(n^2) list before you start to encode a while loop, you end up with O(n^2) space and time.

Sure, but you don't always have to do that, as long as the language allows you to exit before consuming all your fuel: `for x in range(0, n^2) { .. }` doesn't have to take `n^2` steps if you allow an early `return` in the `..` (ok, arguably `range` has to produce its output lazily, too)

> Dependently typed languages such as Idris, which typically have much more sophisticated termination checkers than just banning recursion and unbounded loops.

Doesn't Idris just ban unbounded loops? Dafny might be a better example, but it's not dependently typed.

In fact, the dependent types typically don't play a particularly important role in termination checking in verification languages (they're used for general recursion, but most functions in Coq, for example, are just structurally recursive and involve purely syntactic termination checks).

Neil Mitchell said...

Thanks for the comments!

For this article I don't include Coq, Agda, Indris etc. They are carefully designed and restricted languages for a particular domain. They aren't representative of the languages I've encountered in the real world making these mistakes. I'd love a term that unambiguously describes these languages, but couldn't think of one.

As for time complexity in QuickCheck, I am totally talking about the range function. Often setting up the gas is the costly bit! In one of the three examples there was a built-in lazy range, but in the other two languages it really did cost that much to set up the gas.

Agreed that most dependent typed languages just have structural recursive and syntactic termination checks. I think that's perfectly reasonable for their domain. The restricted "Turing incomplete" languages I have used were always more restrictive than that.

Unknown said...

Fortran and Pl/I used to statically allocate memory for non-renterent functions to save a few bytes, but was abandoned when it had the opposite effect by preventing modules from being reusable: I'd always assumed that recursion was omitted because the language parser was single pass and the designer didn't know how to implement it.

Craig said...

Take a look at Dafny (I see Clément referenced it above). I wouldn't call it Turing-incomplete; you can write whatever you want in it. But it's Turing-incomplete in the useful sense. You can prove not only termination, but other useful preconditions, postconditions, and loop invariants, and you can write them in a mostly obvious way. The code becomes more readable, because what it does is clearly spelled out.

Neil Mitchell said...

Craig, I definitely wasn't thinking of languages like Dafny, or any that do meaningful proof. I agree that all those things can make programming better, if implemented well - and while I haven't used Dafny, I imagine future iterations of "real" programming languages will incorporate such tools to a greater degree.

Sam Griffin said...

I'm always interested to read criticisms of Turing-incomplete languages, because I have a strong bias towards them, and I'm not sure the breakdown that occurs when I try to evangelize them.

First, let me put forward why I care about them, and why I think all programmers should care about them. If you care about something, you should measure it. And if you want to measure something in a computing context, you should embed that measurement into the tools you work with, so you don't have to do the measurement yourself.

Thus, if you care about 1. How long it takes a computation to complete or; 2. How much memory a computation consumes; you should probably use a language that allows you to measure those things with the compiler.

Admittedly, many attempts at Turing-incomplete languages haven't yielded very pragmatic results, as you mention. But I think this is far from an insurmountable barrier (and if you disagree, I'd be glad to hear!).

It seems to me that you should be able to write code in a general-recursive style, but then convert that to primitive recursion in the back end. Much of the primitve-recursive limits can be solved through static analysis, and for those cases that can't, simply have the compiler throw an error and ask the user to annotate the offending code with desired bounds (or even resource usage!).

This seems much more preferable to me than spending large amounts of time hunting down space leaks/reading GHC-core/rewriting semantically-correct code for performance.

Thanks for your attention!

Neil Mitchell said...

Sam, I think you're arguing for something along the research lines of Hume. That sounds fantastic, although it seems like the end of a research journey, rather than a reasonable current state for production programs. None of the languages I used tried to do anything like that - they were just a fairly vanilla language but hobbled.

Dominique Devriese said...

Neil, just a small correction: dependently-typed languages like Idris are not Turing-incomplete. This is a common misunderstanding. It is false in the same way as the claim that Haskell does not support side-effects: Haskell programs can have side-effects as long as their type declares this by living in the IO monad. Similarly, it's perfectly possible to write non-terminating programs in Idris, but you must declare the potential non-termination by giving them a type that allows for it. The type of main in Idris is IO () and this type does not guarantee termination.

Anonymous said...

What about Dhall?
https://dhall-lang.org/

Neil Mitchell said...

I think Dhall will probably add recursion eventually. See http://www.haskellforall.com/2020/01/why-dhall-advertises-absence-of-turing.html where the explain its a signalling mechanism to people who don't really understand what it means, rather than a real feature. Hopefully this article (and educating the entire population) would make it useless for signaling.

bdcroteau said...

Great article. I was working on a Turing-incomplete scripting language over the past year. It is Source Available (but not Open Source yet) and in very early stages. All collections have a conservative upper bound. For loops are possible but not recursion. It supports generics, where one flavor of the generics (for example, named #T instead of T) specifies that the parameter only works as the collection upper bound. My idea was that you could send messages in this language instead of JSON, basically it's a workflow or configuration language, where people would normally use JSON or XML. It's not very fancy, a trivial language when compared to Idris.

mjrosenb said...

> realising that in a linear amount of code we can produce exponentially large values
You can do a whole lot better than exponential

double xs = xs ++ xs -- Double the length of a list
double1 xs = double (double xs) -- *4
double2 xs = double1 (double1 xs) -- *16
double3 xs = double2 (double2 xs) -- *256

Somewhere around double8 it'll exceed the number of atoms in the universe. Not actually relevant to the points that you made, just if we're going to do something inefficient like generate bounds for long running computations, we might as well do it efficiently

Dominique Devriese said...

Neil,

Interesting article.
It might be interesting to point out that the reverse has also happened: initial designs for Turing-complete languages that have later been replaced with Turing-incomplete ones.
C++'s type-level meta-programming was one of the first widely-used type-level computation languages and it was Turing-complete.
Many people now consider this Turing-completeness a mistake because it makes type-checking potentially non-terminating.
More recent designs for type-level computation (e.g. Haskell, Scala etc.) are explicitly not Turing-complete using termination checks.
Many people, but not everyone, in these communities seem to agree that type-level computation languages should be total.

See you,
Dominique

Neil Mitchell said...

@mjrosenb: Yep, that's a way of producing even more values with even less code :)

Dominique: The Hindley Milner type checking algorithm is exponential on its own, before you even add in type-level computation - https://groups.google.com/g/comp.lang.functional/c/o3eP-_0lvKM. I'm broadly a fan of types that terminate, but I wouldn't call it a general purpose language, merely one that has been abused as such.

Bernhard Elsner said...

Neil, I thought I'd add some colour to the DAML example. My recollection (though I also joined Digital Asset after 2016) is that the choice to omit recursion was not driven by a desire to guarantee termination, but by a wish to prove some other properties statically:

Until some time in 2018, the language had a small static analysis tool next to the compiler that guaranteed that no authorisation errors could occur at runtime. I'd argue that that feature would have added more value than simply guaranteeing un-bounded eventual termination, but even so your argumentation held.

The value value add you get only balances the inconvenience of a "hobbled" language in few specialized use-cases.

Neil Mitchell said...

@Bernhard: Yes, it was driven by the obligations checker originally. Although some in the marketing department definitely ran with the termination argument very strongly. One of the earliest things I did before arguing for recursion was showing that recursion wasn't a problem for the obligations checker - because of the way it worked (it ignored most data, focusing on reachable statements, and recursion doesn't actually enable you to get to any new statements).

Anonymous said...

The Linux kernel's extended Berkley packet filter (eBPF) has no unbounded looping, and probably never will. (The wikipedia entry mentions loops being allowed from 5.3, but https://lwn.net/Articles/794934/ has more info; turns out they're bounded).

It allows user code to be run in the kernel, and so it's unlikely to ever allow looping :)

Neil Mitchell said...

@Anon:

* Initially BPF didn't allow any jumps backwards and had very small number of instructions limit.
* Then they raised the limit.
* Then they added bounded loops.

I think where they stand at the moment is similar to Hume. However, I suspect the future evolution will be:

* Realise that the static approximation isn't precise enough (1K page faults is too much, 100K additions is not enough).
* Move to a dynamic bound (e.g. interrupt, background worker) which can be opted in for some workflows.
* Permit unbounded loops for those runtime-bounded BPF.

So I'd say there are meaningful static guarantees to eBPF. And I think that in 5 years time they'll have been relaxed in some cases.