Wednesday, May 04, 2022

Working on build systems full-time at Meta

Summary: I joined Meta 2.5 years ago to work on build systems. I’m enjoying it.

I joined Meta over two years ago when an opportunity arose to work on build systems full time. I started the Shake build system at Standard Chartered over 10 years ago, and then wrote an open source version a few years later. Since then, I’ve always been dabbling in build systems, at both moderate and small scale. I really enjoyed writing the Build Systems a la Carte paper, and as a result, started to appreciate some of the Bazel and Buck design decisions. I was involved in the Bazel work at Digital Asset, and after that decided that there was still lots of work to be done on build systems. I did some work on Cloud Shake, but the fact that I wasn’t working on it every day, and that I wasn’t personally using it, made it hard to productionize. A former colleague now at Meta reached out and invited me for breakfast — one thing led to another, and I ended up at Meta working on build systems full time.

What I’ve learnt about build systems

The biggest challenge at Meta is the scale. When I joined they already used the Buck build system, which had been developed at Meta. Looking at the first milliseconds after a user starts an incremental build is illustrative:

  • With Shake, it starts the process, loads the database into memory, walks the entire graph calling stat on each input and runs any build actions.
  • With Buck, it connects to a running daemon, talks to a running file watcher (Watchman in the case of Buck) and uses reverse dependencies to jump to the running actions.

For Shake, on repos with 100K files, that process might take ~0.5s, but it is O(n). If you increase to 10M files, it takes 50s, and your users will revolt. With Buck, the overhead is proportional to the number of changed files, which is usually a handful.

While Shake is clearly infeasible at the scale of Meta, Buck was also starting to show its age, and I’ve been working with others to significantly improve Buck, borrowing lessons from everywhere, including Shake. Buck also addresses problems that Shake doesn’t, such as how to cope with multi-configuration builds (e.g. building for x86 and ARM simultaneously), having a separate file and target namespace and effective use of remote execution and caching.

We expect that the new version of Buck will be released open source soon, at which point I’ll definitely be talking more about the design and engineering trade-offs behind it.

What's different moving from finance to tech

My career to date has been in finance, so working at Meta is a very different world. Below are a few things that stand out (I believe most of these are common to other big tech companies too, but Meta is my first one).

Engineering career ladder: In finance the promotion path for a good engineer is to become a manager of engineers, then a manager of managers, and so on up. In my previous few roles I was indeed managing teams, which included setting technical direction and doing coding. At Meta, managers look after people, and help set the team direction. Engineers look after code and services, and set the technical direction. But importantly, you can be promoted as an engineer, without gaining direct reports, and the opportunities and compensation are equivalent to that for managers. There are definitely aspects of management that I like (e.g. mentoring, career growth, starting collaborations), and happily all of these are things engineers can still engage in.

Programmer centric culture: In finance the company is often built around traders and sales people. In tech, the company is built around programmers, which is visible in the culture. There are hardware vending machines, free food, free ice cream, minimal approvals. They’ve done a very good job of providing a relaxing and welcoming environment (with open plan offices, but I don’t mind that aspect). The one complaint I had was that Meta used to have a pretty poor work from home policy, but that’s now been completely rewritten and is now very good.

Reduced hierarchy: I think this may be more true of Meta than other tech, but there is very minimal hierarchy. Programmers are all just programmers, not “senior” or “junior”. I don’t have the power to tell anyone what to do, but in a slightly odd way, my manager doesn’t have that power either. If I want someone to tackle a bug, I have to justify that it is a worthwhile thing to do. One consequence of that is that the ability to form relationships and influence people is much more important. Another consequence that I didn’t foresee is that working with people in different teams is very similar to working with people in your team, since exactly the same skills apply. I can message any engineer at Meta, about random ideas and possible collaborations, and everyone is happy to talk.

Migration is harder: In previous places I worked, if we needed 100 files moved to a new version of a library, someone got told to do it, and they went away and spent a lot of time doing it. At Meta that’s a lot harder — firstly, it’s probably 100K files due to the larger scale, and secondly, telling someone they must do something is a lot less effective. That means there is a greater focus on automation (automatically editing the files), compatibility (doesn’t require editing the files) and benefits (ensuring that moving to the new version of the library will make your life better). All those are definitely better ways to tackle the problem, but sometimes, work must be done that is tedious and time consuming, and that is harder to make happen.

Open source: The process for open sourcing an internal library or tool in the developer infrastructure space is very smooth. The team I work in has open sourced the Starlark programming language (taking over maintenance from Google), the Gazebo Rust utility library and a Rust linter, plus we have a few more projects in the pipeline. As I write code in the internal Meta monorepo, it gets sync’d to GitHub a few minutes later. It’s also easy to contribute to open source projects, e.g. Meta engineers have contributed to my projects such as Hoogle (before I even considered joining Meta).

Hiring: Meta hires a lot of engineers (e.g. 1,000 additional people in London). That means that interviews are more like a production line, with a desire to have a repeatable process, where candidates are assigned teams after the interviews, rather than interviewing with a team. There are upsides and downsides to that—if I interview a strong candidate, it’s really sad to know that I probably won’t get to work closely with them. It also means that the interview process is determined centrally, so I can’t follow my preferences. But it does mean that if a friend is looking for a job there’s often something available for them (you can find details on compilers and programming here and a full list of jobs here), and a repeatable process is good for fairness.

Overall I’m certainly very happy to be working on build systems. The build system is the thing that stands between a user and trying out their changes, so anything I can do to make that process better benefits all developers. I’m very excited to share what I’ve been working on more widely in the near future!

(Disclosure: This blog post had to go through Meta internal review, because it’s an employee talking about Meta, but other than typos, came out unchanged.)