Neil Mitchell's Blog (Haskell etc)

Wednesday, May 04, 2022

Working on build systems full-time at Meta

Summary: I joined Meta 2.5 years ago to work on build systems. I’m enjoying it.

I joined Meta over two years ago when an opportunity arose to work on build systems full time. I started the Shake build system at Standard Chartered over 10 years ago, and then wrote an open source version a few years later. Since then, I’ve always been dabbling in build systems, at both moderate and small scale. I really enjoyed writing the Build Systems a la Carte paper, and as a result, started to appreciate some of the Bazel and Buck design decisions. I was involved in the Bazel work at Digital Asset, and after that decided that there was still lots of work to be done on build systems. I did some work on Cloud Shake, but the fact that I wasn’t working on it every day, and that I wasn’t personally using it, made it hard to productionize. A former colleague now at Meta reached out and invited me for breakfast — one thing led to another, and I ended up at Meta working on build systems full time.

What I’ve learnt about build systems

The biggest challenge at Meta is the scale. When I joined they already used the Buck build system, which had been developed at Meta. Looking at the first milliseconds after a user starts an incremental build is illustrative:

With Shake, it starts the process, loads the database into memory, walks the entire graph calling stat on each input and runs any build actions.
With Buck, it connects to a running daemon, talks to a running file watcher (Watchman in the case of Buck) and uses reverse dependencies to jump to the running actions.

For Shake, on repos with 100K files, that process might take ~0.5s, but it is O(n). If you increase to 10M files, it takes 50s, and your users will revolt. With Buck, the overhead is proportional to the number of changed files, which is usually a handful.

While Shake is clearly infeasible at the scale of Meta, Buck was also starting to show its age, and I’ve been working with others to significantly improve Buck, borrowing lessons from everywhere, including Shake. Buck also addresses problems that Shake doesn’t, such as how to cope with multi-configuration builds (e.g. building for x86 and ARM simultaneously), having a separate file and target namespace and effective use of remote execution and caching.

We expect that the new version of Buck will be released open source soon, at which point I’ll definitely be talking more about the design and engineering trade-offs behind it.

What's different moving from finance to tech

My career to date has been in finance, so working at Meta is a very different world. Below are a few things that stand out (I believe most of these are common to other big tech companies too, but Meta is my first one).

Engineering career ladder: In finance the promotion path for a good engineer is to become a manager of engineers, then a manager of managers, and so on up. In my previous few roles I was indeed managing teams, which included setting technical direction and doing coding. At Meta, managers look after people, and help set the team direction. Engineers look after code and services, and set the technical direction. But importantly, you can be promoted as an engineer, without gaining direct reports, and the opportunities and compensation are equivalent to that for managers. There are definitely aspects of management that I like (e.g. mentoring, career growth, starting collaborations), and happily all of these are things engineers can still engage in.

Programmer centric culture: In finance the company is often built around traders and sales people. In tech, the company is built around programmers, which is visible in the culture. There are hardware vending machines, free food, free ice cream, minimal approvals. They’ve done a very good job of providing a relaxing and welcoming environment (with open plan offices, but I don’t mind that aspect). The one complaint I had was that Meta used to have a pretty poor work from home policy, but that’s now been completely rewritten and is now very good.

Reduced hierarchy: I think this may be more true of Meta than other tech, but there is very minimal hierarchy. Programmers are all just programmers, not “senior” or “junior”. I don’t have the power to tell anyone what to do, but in a slightly odd way, my manager doesn’t have that power either. If I want someone to tackle a bug, I have to justify that it is a worthwhile thing to do. One consequence of that is that the ability to form relationships and influence people is much more important. Another consequence that I didn’t foresee is that working with people in different teams is very similar to working with people in your team, since exactly the same skills apply. I can message any engineer at Meta, about random ideas and possible collaborations, and everyone is happy to talk.

Migration is harder: In previous places I worked, if we needed 100 files moved to a new version of a library, someone got told to do it, and they went away and spent a lot of time doing it. At Meta that’s a lot harder — firstly, it’s probably 100K files due to the larger scale, and secondly, telling someone they must do something is a lot less effective. That means there is a greater focus on automation (automatically editing the files), compatibility (doesn’t require editing the files) and benefits (ensuring that moving to the new version of the library will make your life better). All those are definitely better ways to tackle the problem, but sometimes, work must be done that is tedious and time consuming, and that is harder to make happen.

Open source: The process for open sourcing an internal library or tool in the developer infrastructure space is very smooth. The team I work in has open sourced the Starlark programming language (taking over maintenance from Google), the Gazebo Rust utility library and a Rust linter, plus we have a few more projects in the pipeline. As I write code in the internal Meta monorepo, it gets sync’d to GitHub a few minutes later. It’s also easy to contribute to open source projects, e.g. Meta engineers have contributed to my projects such as Hoogle (before I even considered joining Meta).

Hiring: Meta hires a lot of engineers (e.g. 1,000 additional people in London). That means that interviews are more like a production line, with a desire to have a repeatable process, where candidates are assigned teams after the interviews, rather than interviewing with a team. There are upsides and downsides to that—if I interview a strong candidate, it’s really sad to know that I probably won’t get to work closely with them. It also means that the interview process is determined centrally, so I can’t follow my preferences. But it does mean that if a friend is looking for a job there’s often something available for them (you can find details on compilers and programming here and a full list of jobs here), and a repeatable process is good for fairness.

Overall I’m certainly very happy to be working on build systems. The build system is the thing that stands between a user and trying out their changes, so anything I can do to make that process better benefits all developers. I’m very excited to share what I’ve been working on more widely in the near future!

(Disclosure: This blog post had to go through Meta internal review, because it’s an employee talking about Meta, but other than typos, came out unchanged.)

Thursday, September 16, 2021

Huge Project Build Systems

Summary: Shake won't scale to millions of files, this post says what would be required to make it do so.

While Shake has compiled projects with hundreds of thousands of files, it's never scaled out to millions of files, and it would be unlikely to work well at that size. The most popular build systems that operate at that scale are Buck (from Facebook) and Bazel (from Google). In this post I go through the changes that would need to be made to make Shake scale.

The first issue is covered in my previous post, that Shake doesn't know if you change the build rules themselves. As you scale up, it becomes much more important that if you change the rules, everything is properly tracked. As the number of people involved in a project increases, the rate at which the build system changes will also increase. Both Buck and Bazel solve this problem using a deterministic Python-based configuration language called Starlark. If Shake stopped being a Haskell DSL, I'd argue that it stops being Shake and becomes something different, so it's unclear what could be done there.

The next issue is that every time Shake is invoked, it checks the modification time of every file, and then walks the entire dependency graph. That works fine at 10K files, but as you move to 1M files, it takes too long. The solution is two-fold, first be informed which files have changed using notification APIs (e.g. the Watchman tool), and then use reverse dependencies to only explore the portion of the graph that has changed. Happily, Pepe already has a patch adding reverse dependencies to Shake, so that isn't too infeasible.

The final issue is that Shake was designed as a single-machine build system, not for sharing results between users. When I first wrote Shake, I didn't have access to servers, and AWS was brand new. Now, over a decade later, servers are easy to obtain and large scale build systems need to share results, so that if one user builds a file, no one else needs to. Within the realm of multi-user build systems, there are two basic operations - sharing results and offloading commands.

Shake, with it's new cloud features, is able to share results between users using a shared drive. It works, and big companies are using it for real, but I'd consider it fairly experimental. For execution, Shake is unable to run actions remotely, so can't make use of something like Bazel's remote execution API. Since dependencies are specified at the rule level, and remote execution operates at the command level, there is a bit of a mismatch, and it's unclear what that might look like in Shake.

While Shake won't work at huge scales, it is still quite an effective build system at quite large scales. But, given the limitations, I imagine it will never get to the scale of Buck/Bazel. At the same time, Buck/Bazel lack dynamic dependencies, which makes them unable to express rules such as Haskell effectively.

Happily, I am involved with a new build system, the next generation of Buck. I joined Facebook two years ago, and since that time have been focused on this project. It's written in Rust, configured with Starlark (I've spent a lot of time working on an open-source Starlark interpreter in Rust), and should work at huge scales. It's not yet open source, but it will be - we are targeting early next year.

I think Shake is still a build system with a lot to offer, and continue to maintain and improve it. For people who want to scale beyond the range of Shake, I'd definitely recommend using the next generation of Buck, once it is available.

Wednesday, September 15, 2021

Small Project Build Systems

Summary: Forward build systems might work better for small projects.

Yesterday's post talked about how Shake is a good medium sized build system - but what about smaller projects? Is the Shake model right for them? Shake can be considered a backwards build system. Each rule says how to produce a file, given some input files (which are dependencies) and an action. Almost all build systems (e.g. Make, Buck, Bazel, CMake, SCons, Ninja) fit this model, which is analysed in the Build Systems a la Carte paper. While this model works, it has two disadvantages:

You have to explicitly list dependencies, or infer them from include files etc. That means either dependencies are insufficient (you probably forgot some), or they are excessive (you added some you don't need). Usually both.
You have to think backwards. When you ask someone how to build an executable from a C file, no one talks about linking first, but to program a build system you have to.

The alternative to a backwards build system is a forwards build system, of which Memoize was the first. You just write out the commands in order, and dependency tracing figures out if they have changed. To compile a C program it can be as simple as:

gcc -c util.c
gcc -c main.c
gcc -o main main.o util.o

That build script is incredibly simple - so simple it could also be treated as a shell script.

A few years ago I wrote such a system, called Rattle, and wrote a paper about it at OOPSLA 2020 with my co-authors Sarah Spall and Sam Tobin-Hochstadt. Sarah gave a talk about Rattle at OOPSLA, and I gave a talk at Build Meetup 2021. We were able to compile projects like NodeJS faster than the NodeJS build system (which uses Make), showing the idea might be feasible.

If forward build systems are so great, why do I think they are most suitable for small projects? There are four reasons, the first three of which have mitigations, but the final one sets a limit on the size at which forward build systems are suitable.

Forward build systems rely on tracing which files are dependencies of a command. Doing that quickly in a cross-platform manner is a nightmare. There are tricks like hooking system calls etc, but it presents a significant engineering hurdle, especially on MacOS, which makes this task harder with every release.
Forward build systems are immature. The earliest examples no longer work. Rattle is a relatively practical research system - it could evolve into a production system - but it's not there yet. And compared to the alternatives, Rattle is probably one of the closest to production, in large part because it builds off a lot of common infrastructure from Shake which is much more mature.
Forward build systems lack parallelism, since if you want to express parallelism, you need to think about dependencies once more, and it's easy to go wrong. Rattle mostly solves the parallelism by automatically inferring when it is safe to parallelise, which is how we were able to remain competitive with Make.

And finally, the biggest issue is that forward build systems are not compositional, while backward build systems are. If you want to write a 1 million build rule system, in a backwards system, each rule looks like any other. Whereas in a forward build system, assuming you need to give an order, writing down that order in a compositional way is hard - in fact, whenever I've tried it, you start expressing the dependencies between entries and end up with a backwards build system.

Happily, people are continuing to research forward build system. Rattle adds parallelism, Stroll removes the requirement for an order, Fac allows some dependencies and infers the remaining ones, LaForge finds greater incrementality. Perhaps all those ideas can be combined, along with a lot of engineering, to produce a practical forward build system.

Rattle has shown a well engineered forward build system would be feasible for small projects. It's unclear how much larger the concept might be able to scale, probably never to millions of files, but for small projects it might provide a significantly lower effort path to writing build systems.

Tuesday, September 14, 2021

Reflecting on the Shake Build System

Summary: As a medium-sized build system, Shake has some good bits and some bad bits.

I first developed the Shake build system at Standard Chartered in 2008, rewriting an open source version in my spare time in 2011. I wrote a paper on Shake for ICFP 2012 and then clarified some of the details in a JFP 2020 paper. Looking back, over a decade later, this post discusses what went well and what could be improved.

The first thing to note is that Shake is a medium sized build system. If you have either 1 source file or 1 million source files, Shake probably isn't a good fit. In this post I'm going to go through how Shake does as a medium-sized build system, and two other posts reflect on what I think a small build system or huge build system might look like.

The most important thing Shake got right was adding monadic/dynamic dependencies. Most build systems start with a static graph, and then, realising that can't express the real world, start hacking in an unprincipled manner. The resulting system becomes a bunch of special cases. Shake embraced dynamic dependencies. That makes some things harder (no static cycle detection, less obvious parallelism, must store dependency edges), but all those make Shake itself harder to write, while dynamic dependencies make Shake easier to use. I hope that eventually all build systems gain dynamic dependencies.

In addition to dynamic dependencies, Shake has early cut-off, meaning files that rebuild but don't change don't invalidate rules that depend upon them. This feature is something increasingly becoming standard in build systems, which is great to see.

Shake is written as a Haskell DSL, which means users of Shake are writing a Haskell program that happens to heavily leverage the Shake library. That choice was a double-edged sword. There are some significant advantages:

I didn't need to invent a special purpose language. That means I get to reuse existing tooling, existing learning materials, and existing libraries.
Since Shake is just a library, it can be documented with the normal tools like Haddock.
Users can extend Shake using Haskell, and publish libraries building on top of Shake.
The modelling of monadic dependencies in Haskell is pretty elegant, given a dedicated syntax for expressing monadic computations (the do keyword).
Projects like Haskell Language Server can build on top of Shake in fairly fundamental ways. See our recent IFL 2020 paper for the benefits that brought.

But there are also some downsides:

Most significantly, the audience of Shake is somewhat limited by the fact that Shake users probably have to learn some Haskell. While the user manual aims to teach enough Haskell to write Shake without really knowing Haskell, it's still a barrier.
Haskell has some significant weaknesses, e.g. it has two competing package managers, countless distribution mechanisms, and none of these things are consistent for long periods of time. Haskell has a poor on-ramp, and thus so does Shake.

The choice of an embedded DSL for Shake also leads to the issue that Shake doesn't know when a rule has changed, since a rule is opaque Haskell code. As a consequence, if you modify a command line in a .hs file, Shake is unaware and won't rebuild the necessary files. There are a bunch of techniques for dealing with this limitation (see the Shake functions shakeVersion, versioned), but none are pleasant, and it remains an easy mistake to make. A potential way out is to build a system which reads configuration files not in Haskell and interprets them, which I gave a talk about, and I've seen deployed in practice. But it's something where each user ends up rolling their own.

Another limitation is that Shake is (deliberately) quite low-level. It gives you a way to depend on a file, and a way to run a command line. It doesn't give you a way to express a C++ library. The hope from the beginning was that Shake would be language neutral, and that libraries would arise that built on top of Shake providing access to standard libraries. If you were writing a Python/C++/Ruby build script, you'd simply import those libraries, mix them together, and have a working build system. There are libraries that have gone in that direction, the libraries shake-language-c and shake-cpp provide C++ rules, avr-shake lets you work with AVR Crosspack. Unfortunately, there aren't enough libraries to just plug together a build system. I think a fundamental problem is that it's not immediately obvious how such libraries would compose, and without that composability, it's hard to build the libraries that would rely on composability.

Finally, the most surprising aspect about developing Shake is that a large part of the effort has gone into writing an ergonomic and cross-platform process executor. The result is found at Development.Shake.Command, and can be used outside Shake, letting users write:

cmd "gcc -c" [src]

This example invokes gcc, ensuring that src is properly escaped if it has spaces or other special characters. A significant amount of the engineering work in Shake has gone into that facility, when it's totally orthogonal to the build system itself.

In the next two parts of this series, I'll go through why I think Shake isn't a great build system for tiny projects (and what might be), followed by why Shake isn't great for absolutely huge projects (and what would need to be fixed).

Sunday, January 17, 2021

Recording video

Summary: Use OBS, Camo and Audacity.

I recently needed to record a presentation which had slides and my face combined, using a Mac. Based on suggestions from friends and searching the web, I came up with a recipe that worked reasonably well. I'm writing this down to both share that recipe, and so I can reuse the recipe next time.

Slide design: I used a slide template which had a vertical rectangle hole at the bottom left so I could overlay a picture of my video. It took a while to find a slide design that looked plausible, and make sure callouts/quotes etc didn't overlap into this area.

Camera: The best camera you have is probably the one on your phone. To hook up my iPhone to my Mac I used a £20 lightning to USB-C cable (next day shipping from Apple) along with the software Camo. I found Camo delightfully easy to use. I paid £5 per month to disable the logo and because I wanted to try out the portrait mode to blur my background - but that mode kept randomly blurring and unblurring things in the background, so I didn't use it. Camo is useful, but I record videos infrequently, and £5/month is way too steep. I'm not a fan of software subscriptions, so I'll remember to cancel Camo. Because it is subscription based, and subscribing/cancelling is a hassle, I'll probably just suck up the logo next time.

Composition: To put it all together I used OBS Studio. The lack of an undo feature is a bit annoying (click carefully), but otherwise everything was pretty smooth. I put my slide deck (in Keynote) on one monitor, and then had OBS grab the slide contents from it. I didn't use presentation mode in Keynote as that takes over all the screen, so I just used the slide editing view, with OBS cropping to the slide contents. One annoyance of slide editing view is that spelling mistakes (and variable names etc.) have red dotted underlines, so I had to go through every slide and make sure the spellings were ignored. Grabbing the video from Camo into OBS was very easy.

Camera angle: To get the best camera angle I used a lighting plus phone stand (which contains an impressive array of stands, clips, extensions etc) I'd already bought to position the camera right in front of me. Unfortunately, putting the camera right in front of me made it hard to see the screen, which is what I use to present from. It was awkward, and I had to make a real effort to ensure I kept looking into the camera - using my reflection on the back of the shiny iPhone to make sure I kept in the right position. Even then, watching the video after, you can see my eyes dart to the screen to read the next slide. There must be something better out there - or maybe it's only a problem if you're thinking about it and most people won't notice.

Recording: For actual recording there are two approaches - record perfectly in one take (which may take many tries, or accepting a lower quality) or repeatedly record each section and edit it together after. I decided to go for a single take, which meant that if a few slides through I stumbled then I restarted. Looking at my output directory, I see 15 real takes, with a combined total of about an hour runtime, for a 20 minute talk. I did two complete run throughs, one before I noticed that spelling mistakes were underlined in dotted red.

Conversion to MP4: OBS records files as .mkv, so I used VLC to preview them. When I was happy with the result, I converted the file to .mp4 using the OBS feature "Remux recordings".

Audio post processing: After listening to the audio, there was a clear background hum, I suspect from the fan of the laptop. I removed that using Audacity. Getting Audacity to open a .mp4 file was a bit of an uphill struggle, following this guide. I then cleaned up the audio using this guide, saved it as .wav, and reintegrated it with the video using ffmpeg and this guide. I was amazed and impressed how well Audacity was able to clean up the audio with no manual adjustment.

Sharing: I shared the resulting video via DropBox. However, when sharing via DropBox I noticed that the audio quality was significantly degraded in the DropBox preview on the iOS app. Be sure to download the file to assess whether the audio quality is adequate (it was fine when downloaded).

Sunday, November 15, 2020

Data types for build system dependencies

Summary: Monadic and early cut-off? Use a sequence of sets.

In the Build Systems a la Carte paper we talk about the expressive power of various types of build systems. We deliberately simplify away parallelism and implementation concerns, but those details matter. In this post I'm going to discuss some of those details, specifically the representation of dependencies.

Applicative build systems

In an applicative build system like Make, all dependencies for a target are known before you start executing the associated action. That means the dependencies have no ordering, so are best represented as a set. However, because they can be calculated from the target, they don't usually need to be stored separately. The dependencies can also be evaluated in parallel. To build a target you evaluate the dependencies to values, then evaluate the action.

Early cut-off is when an action is skipped because none of its dependencies have changed value, even if some dependencies might have required recomputing. This optimisation can be incredibly important for build systems with generated code - potentially seconds vs hours of build time. To obtain early cut-off in applicative systems, after evaluating the dependencies you compare them to the previous results, and only run the action if there were changes.

Monadic build systems

In monadic build systems like Shake, the representation of dependencies is more complex. If you have an alternative mechanism of detecting whether a rule is dirty (e.g. reverse dependencies) you don't need to record the dependencies at all. If the key is dirty, you start executing the action, and that will request the dependencies it needs. The action can then suspend, calculate the dependencies, and continue.

If you want early cut-off in a monadic build system, you need to rerun the dependencies in advance, and if they all have the same result, skip rerunning the action. Importantly, you probably want to rerun the dependencies in the same order that the action originally requested them -- otherwise you might pay a severe and unnecessary time penalty. As an example, let's consider an action:

opt <- need "is_optimised"
object <- if opt then need "foo.optimised" else need "foo.unoptimised"
link object

This rule is monadic, as whether you need the optimised or unoptimised dependency depends on the result of calculating some is_optimised property. If on the first run is_optimised is True, then we build foo.optimised. On the second run, if is_optimised is False, it is important we don't build foo.optimised as that might take a seriously long time and be entirely redundant. Therefore, it's important when checking for early cut-off we build in the order that the previous action requested the dependencies, and stop on the first difference we encounter.

(If you have unlimited resources, e.g. remote execution, it might be profitable to evaluate everything in parallel - but we're assuming that isn't the case here.)

Provided a rule performs identically between runs (i.e. is deterministic and hasn't been changed), everything that we request to check for early cut-off will still be needed for real, and we won't have wasted any work. For all these reasons, it is important to store dependencies as a sequence (e.g. a list/vector).

Monadic build systems plus parallelism

Applicative build systems naturally request all their dependencies in parallel, but monadic build systems are naturally one dependency at a time. To regain parallelism, in build systems like Shake the primitive dependency requesting mechanism takes a set of dependencies that are computed in parallel. While requesting dependencies individually or in bulk gives the same result, in bulk gives significantly more parallelism. (In Shake we use lists to track correspondence between requests and results, but it's morally a set.)

As we saw previously, it is still important that for early cut-off you reproduce the dependencies much like they were in the action. That means you request dependencies in the order they were requested, and when they were requested in bulk, they are also checked in bulk. Now we have a sequence of sets to represent dependencies, where the elements of the sets can be checked in parallel, but the sequence must be checked in order.

Monadic build systems plus explicit parallelism

What if we add an explicit parallelism operator to a monadic build system, something like parallel :: [Action a] -> IO [a] to run arbitrary actions in parallel (which is what Shake provides). Now, instead of a sequence of sets, we have a tree of parallelism. As before it's important when replaying that the dependencies are requested in order, but also that as much is requested in parallel as possible.

What Shake does

Shake is a monadic build system with early cut-off, parallelism and explicit parallelism. When building up dependencies it uses a tree representation. The full data type is:

data Traces
    = None
    | One Trace
    | Sequence Traces Traces
    | Parallel [Traces]

Sequenced dependencies are represented with Sequence and the traces captured by parallelism use Parallel. Importantly, constructing Traces values is nicely O(1) in all cases. (Shake v0.19.1 used a different representation and repeatedly normalised it, which could have awful time complexity - potentially O(2^n) in pathological cases.)

While these traces store complete information, actually evaluating that trace when checking for rebuilds would be complicated. Instead, we flatten that representation to [[Trace]] for writing to the Shake database. The outer list is a sequence, the inner list is morally a set. We have the invariant that no Trace value will occur multiple times, since if you depend on something once, and then again, the second dependency was irrelevant. To flatten Parallel computations we take the first required dependency in each parallel action, merge them together, and then repeat for the subsequent actions. If you run code like:

parallel [
    need ["a"] >> parallel [need ["b"], need ["c"]]
    need ["d"]
]

It will get flattened to appear as though you wrote need ["a","d"] >> need ["b","c"]. When checking, it will delay the evaluation of b and c until after d completes, even though that is unnecessary. But simplifying traces at the cost of marginally less rebuild parallelism for those who use explicit parallelism (which is not many) seems like the right trade-off for Shake.

Conclusion

Applicative build systems should use sets for their dependencies. Monadic build systems should use sets, but if they support early cut-off, should use sequences of sets.

Monday, November 09, 2020

Turing Incomplete Languages

Summary: Some languages ban recursion to ensure programs "terminate". That's technically true, but usually irrelevant.

In my career there have been three instances where I've worked on a programming language that went through the evolution:

Ban recursion and unbounded loops. Proclaim the language is "Turing incomplete" and that all programs terminate.
Declare that Turing incomplete programs are simpler. Have non-technical people conflate terminate quickly with terminate eventually.
Realise lacking recursion makes things incredibly clunky to express, turning simple problems into brain teasers.
Add recursion.
Realise that the everything is better.

Before I left university, this process would have sounded ridiculous. In fact, even after these steps happened twice I was convinced it was the kind of thing that would never happen again. Now I've got three instances, it seems worth writing a blog post so for case number four I have something to refer to.

A language without recursion or unbounded loops

First, let's consider a small simple statement-orientated first-order programming language. How might we write a non-terminating program? There are two easy ways. Firstly, write a loop - while (true) {}. Second, write recursion, void f() { f() }. We can ban both of those, leaving only bounded iteration of the form for x in xs { .. } or similar. Now the language is Turing incomplete and all programs terminate.

The lack of recursion makes programs harder to write, but we can always use an explicit stack with unbounded loops.

The lack of unbounded loops isn't a problem provided we have an upper bound on how many steps our program might take. For example, we know QuickSort has worst-case complexity O(n^2), so if we can write for x in range(0, n^2) { .. } then we'll have enough steps in our program such that we never reach the bound.

But what if our programming language doesn't even provide a range function? We can synthesise it by realising that in a linear amount of code we can produce exponentially large values. As an example:

double xs = xs ++ xs -- Double the length of a list
repeated x = double (double (double (double (double (double (double (double (double (double [x])))))))))

The function repeated 1 makes 10 calls to double, and creates a list of length 2^10 (1024). A mere 263 more calls to double and we'll have a list long enough to contain each atom in the universe. With some tweaks we can cause doubling to stop at a given bound, and generate numbers in sequence, giving us range to any bound we pick.

We now have a menu of three techniques that lets us write almost any program we want to do so:

We can encoding recursion using an explicit stack.
We can change unbounded loops into loops with a conservative upper bound.
We can generate structures of exponential size with a linear amount of code.

The consequences

Firstly, we still don't have a Turing complete language. The code will terminate. But there is no guarantee on how long it will take to terminate. Programs that take a million years to finish technically terminate, but probably can't be run on an actual computer. For most of the domains I've seen Turing incompleteness raised, a runtime of seconds would be desirable. Turing incompleteness doesn't help at all.

Secondly, after encoding the program in a tortured mess of logic puzzles, the code is much harder to read. While there are three general purpose techniques to encode the logic, there are usually other considerations that cause each instance to be solved differently. I've written tree traversals, sorts and parsers in such restricted languages - the result is always a lot of comments and at least one level of unnecessary indirection.

Finally, code written in such a complex style often performs significantly worse. Consider QuickSort - the standard implementation takes O(n^2) time worst case, but O(n log n) time average case, and O(log n) space (for the stack). If you take the approach of building an O(n^2) list before you start to encode a while loop, you end up with O(n^2) space and time. Moreover, while in normal QuickSort the time complexity is counting the number of cheap comparisons, in an encoded version the time complexity relates to allocations, which can be much more expensive as a constant factor.

The solution

Most languages with the standard complement of if/for etc which are Turing incomplete do not gain any benefit from this restriction. One exception is in domains where you are proving properties or doing analysis, as two examples:

Dependently typed languages such as Idris, which typically have much more sophisticated termination checkers than just banning recursion and unbounded loops.
Resource bounded languages such as Hume, which allow better analysis and implementation techniques by restricting how expressive the language is.

Such languages tend to be a rarity in industry. In all the Turing incomplete programming languages I've experienced, recursion was later added, programs were simplified, and programming in the language became easier.

While most languages I've worked on made this evolution in private, one language, DAML from Digital Asset, did so in public. In 2016 they wrote:

DAML was intentionally designed not to be Turing-complete. While Turing-complete languages can model any business domain, what they gain in flexibility they lose in analysability.

Whereas in 2020 their user manual says:

If there is no explicit iterator, you can use recursion. Let’s try to write a function that reverses a list, for example.

Note that while I used to work at Digital Asset, these posts both predate and postdate my time there.