Saturday, January 04, 2014

Running Makefiles with Shake

Summary: Shake can interpret some Makefiles, just like Make.

Since version 0.10 the Shake build system also installs a shake executable that can interpret a Makefile, just like you would use with make. To try it out, just type shake where you would normally type make (i.e. a directory containing a Makefile) and Shake will attempt to perform the build. I say attempt, as there are plenty of caveats, and it is unlikely that randomly chosen Makefile will work.

The current state

  • shake can interpret some Makefile examples. For example, the first four examples from this Makefile tutorial can be executed by shake. There are plenty of missing features, too many to list. It is likely that some tutorials or simple projects will work, but most "real" projects will not.
  • shake is sufficiently compatible with make (for the subset of features it supports) that I actually run the GNU Make testsuite using shake (see test-make.sh). I currently pass 16 tests from 8 categories, out of 83 tests in 39 categories. One of the tests I pass is even failed by the MinGW version of make.
  • The command like arguments to shake are mostly a superset of those to make. In fact, all Shake programs that use shakeWithArgs support most of these options - including things like --jobs, --keep-going, --directory and --always-make.

The original plan

My original plan had 3 steps (spoiler alert: I never even finished step 1):

Step 1: I wanted to interpret many common Makefile patterns, so some simpler projects could use either make or shake interchangeably. I never expected it to work for complex projects like GHC or the Linux Kernel, but did hope to work for the average one-author open source project.

Step 2: I wanted to add need as a command that pulled in additional dependencies. One of the weaknesses of make (fixed by Shake) is that you have to specify all dependencies up front, before you have generated other dependencies. I wanted to extend the Makefile syntax to support:

foo.tar : foo.txt
    cat foo.txt | need
    tar -cf foo.tar `cat foo.txt`

Here cat foo.txt | need would add dependencies on all the files listed by foo.txt and build them if necessary, before continuing.

Step 3: I wanted to write a converter that took an extended Makefile and produced a Shakefile.hs Haskell program that used the standard Shake library, giving users a seamless migration path.

What now?

This feature has no active users, and I don't immediately see who would chose to use it. There are advantages of using shake over make, primarily profiling and progress prediction, but they aren't killer advantages. Make is a large and complex program, and while you could replicate it in Haskell on top of Shake, you probably don't want to. As an example, if a rule for a file does not update the file, Shake will not rerun that rule in subsequent builds but Make will. Arguably the Make behaviour is a bug, but like so many misfeatures, someone will have relied on it somewhere. Replicating Make bug-for-bug would be a lot of effort, end up very ugly, and if you don't have complete compatibility, big projects are unlikely to be able to switch.

Despite my pessimism around this feature, I don't intend to remove it. If it gains users I would be pleasantly surprised. If there is some small feature someone thinks it should support (ideally to gain additional users!) I'm happy to implement it. If people want to send pull requests, or even take over development of this part, I'd be very happy. Shake has gained a number of useful features while implementing Make compatibility, so even if the code is never used in anger, it was still of benefit.

A quick tour of the code

The code for the Make compatibility is all in the repo under Development/Make. It's relatively short and might serve as a nice example for intermediate level Haskell programmers. There are 5 files:

  • Type defines the type Makefile, which is the parsed representation of a Makefile (58 lines).
  • Parse takes a file and parses it, producing a Makefile value (71 lines).
  • Env provides a representation of the Make environment, describes which variables are assigned to which values, and evaluates expressions in the environment (40 lines).
  • Rules defines a new Shake rule type, mostly like the normal Shake file rule, but having slightly different semantics - if a rule fails to generate a file, that isn't an error (59 lines).
  • All puts all the pieces together - parsing a Makefile, evaluating the expressions within it while building up an environment, and then translating to Shake rules (135 lines).

The code is fairly well segregated and serves as a reasonable template for interpreting alternative input build specifications under Shake. The Development/Ninja source code follows much the same pattern.

13 comments:

Unknown said...

Neil, it's nice to see that you're proceeding with Shake development. There are features that Make miss (among which I highlight the inability to assess file attributes or automatically create directory for current file).

But I want to let you know that the main feature you decided to implement Shake (which is described in Step 2) already exists in Make. In Make you can include other files into Makefile and Make will parse them as Makefiles and append rules from them. If you will have a rule that update those files (or even the Makefile itself), Make will run those rules and if any file was updated, it will reread the whole Makefile and only after that will continue to building the ultimate goal.

Also, I want to point out to VPATH feature in Make. This allows doing the following: if you have a large project you can have a master working copy of it where everything is built already (e.g. nightly build) and have a personal working copy with a subset of sources just enough to allow you accomplish particular task (e.g. your current task is to add some functionality to one of libraries inside the project). You can have Make use prebuilt artifacts from master working copy and rebuild only those artifacts that you have changed and those that depend on artifacts that you have changed.

That is a killer feature for large projects. I happen to work on a project that had 2+Gb of sources inside of it and am an expert in Make. Make is a powerful tool although it requires much attention to detail and understanding of what you're doing (in terms of writing build rules).

Andrew Wagner said...

Hi Neil, thanks for the post! Perhaps if enough of make ~was reimplemented on top of shake, maybe shake would gain traction as a sort of lint for make based builds. Once your code passes, not only do you have a cleaned up make build, you also have a migration path to a pure shake build.

Neil Mitchell said...

Maxim: I am aware of the reloading, and it works, up to a point. However, it doesn't compose very nicely. You can define a rule to build *.obj, but including all makefiles that will later be generated (e.g. *.obj.mk) doesn't really work. The other issue is it rebuilds from scratch each time, and if you have a big enough project with enough of these generated dependencies you can end up restarting almost continuously. I find the need formulation in Shake to be far more practically useful - it feels natural, whereas the Make reloading feels hacky. I appreciate make experts may disagree though. Btw, I have some examples that are easy using need, but as yet, I haven't managed to hack up in make - see monad3 at https://github.com/ndmitchell/build-shootout - if you have an implementation of that it would be interesting.

I wasn't aware of VPATH, but can certainly see why that's useful. I know people using Shake have done similar things using symlinks which works quite nicely. The make solution seems a bit more built in though.

Unknown: Yes, that's certainly a possible route to go, the only question is make users would follow it. I also wonder if the use of "hacky bits" in Make is due to lack of expressive power in Make, and thus if it's possible to clean up the Makefile before migrating away. I'm not an expert Make user, so suspect others would know better than I do.

Unknown said...

There is no problem with including all *.obj.mk. The GNU Make manual suggests having all object files in a variable, translating it with patsubst and including in Makefile.
Although this is doable it is rather cumbersome. I use an easier approach: I have a pattern rule to build *.obj files that has an order-only dependency on *.obj.mk:

%.obj: %.c | %.obj.mk
...

Then I have a rule to build %.obj.mk.

%.obj.mk: %.c
...

The inclusion is done like this:

include $(sh find $BUILD_DIR -name "*.obj.mk")

So, on first build it is assumed that fresh build doesn't require extra dependency information (which *.obj.mk actually is). After first build, *.obj.mk files are there and incremental builds know all extra dependencies.

As of your monad3 example, I understand it like this: you have some dependencies generated into a file. This is trivial in Make:


include list

list:
monad3-run source -- $@

gen:
monad3-gen -- $@


As of "rebuilds from scratch each time", this is not true. Rules for rebuilding those files are just plain Make rules and you can introduce any dependencies that you see fit. For *.obj.mk files I generate an extra rule that introduces dependency of *.obj.mk the same as for *.obj file:

foo.obj foo.obj.mk: foo.c header1.h header2.h

Unknown said...

Oh, sorry, monad3 example you should have

-include list

Neil Mitchell said...

Maxim: Interesting trick, I suspect for some examples not having the information available before building the pattern might break it though. Using your description as a starting point I did manage to do monad3 in make. See https://github.com/ndmitchell/build-shootout/blob/master/examples/monad3-make . I am pretty sure Shake is either more powerful or equal in power to Make for these types of dependencies - but I'm still trying to figure out which.

For "rebuilds from scratch" I should have said "restarts from scratch" - e.g. it has to reread the makefile, and recheck the dependencies. That takes time, and if it happens regularly, destroys parallelism.

Unknown said...

First, Neil, when you write Make rules it is a good practice to use special variables like $@ , $^ and $< to refer to rule goal, all dependencies and first dependency. This makes rule more portable and easy to read (you can say whether this "list" is a verbatim string that should be there or just a name of rule goal).

I have rewritten your monad3 Makefile to simplify it:
https://gist.github.com/maximkulkin/8273197

Not sure I understand your skepticism about this dependency generation approach. The thing is that in Make there is actually to stages of execution: first it reads Makefile and all included files, then it tries to update Makefile and all included files. If any file was updated, it rereads the Makefile and goes straight for the target goal (doesn't tries to update makefiles second time). If no makefiles where initially updated, it goes for target goal without rereading them. The parallelism that you've mentioned happens _between_ the rereads: it can update makefiles in parallel or update the target goal in parallel. This is not a big issue considering benefits that it provides. There is no parallelism between two builds (two "make" invocations). The thing is that it needs all dependency graph complete when it starts update tasks and doesn't allow it's modification during the update process.

I have tested this on a very large project and found out that Makefile reread time is insignificant comparing to the size of the project and other tasks that it need to perform.

Neil Mitchell said...

Maxim: I'm deliberately not using any special variables in the Make example here, it allows all the examples to look closer to each other in the various languages. I appreciate real Makefile's would look a bit different, and the argument about the "type" of list is a good one. Similarly real Shake programs wouldn't use 'cat'. I should probably add a section to the Readme advising on what the violations of best-practice are, or perhaps just give up trying to make them similar ensuring it's more useful for teaching...

I notice your simplification changes the contents of "list". The rules of the test are that list looks exactly like the output of monad3-run. Again, this is mostly to make sure all the tests are equivalent - a real project would have more flexibility.

Taking a look at LLVM (a large Makefile project I have to hand on my machine) on Windows it takes about 199s to read all the Makefile and do the initial evaluation, even if there is nothing to build. I wouldn't want to double/triple that with rebuilding. You also very much proceed in stages with the build all makefiles and restart approach. The more stages you have, the more that will cost in parallelism, as you coarsely assign each task to a stage, when some could occur in any stage.

I am far from a Make expert, and still learning about what the power of Makefile rereading is. I agree that it gives you some fundamental power that you wouldn't have had otherwise. However, I think 'need' gives you at least as much power, without the cost of stages and in a more convenient package.

I'm still not sure I can articulate (or even know) quite what I dislike about include files. It may be that with need you are saying "at this point, output depends on these additional things". Whereas with 'include' you are very much splicing in anything, which could do anything, and is thus less modular and harder to reason about. The dependencies of 'gen' depend on a sed fragment inside the list rule.

Unknown said...

Not using some system's best practices makes your tests synthetic and ruins the whole idea of comparison of whatever you're comparing (I guess, the ease of use of the system).

I have updated the gist and renamed "list" target to "list.mk" to emphasize that this is not the list but rather the part of a Makefile.

The problem with LLVM build system is trivial: they use recursive-Make style. This is the biggest flaw: they run a separate instance of Make in every subdirectory and most of the time wasted on launching just another Make instance. The approach I'm talking about is using monolithic-style Makefile when all submodules provide their own part of Makefile specially prepared for using in that type of build setup. I recommend reading this article for better understanding of the problem: http://miller.emu.id.au/pmiller/books/rmch/

Neil Mitchell said...

The original purpose was to benchmark power, not ease of use at all.

I have read the recursive make thing, and agree with it's conclusions. However, practical projects (e.g. LLVM) must contain people who have read that (hasn't almost everyone?) so why didn't they follow it's recommendations? Why don't they convert - 199s zero rebuild times must be really painful. I'd be genuinely curious to find out why.

Unknown said...

Because of autoconf. It is very easy to create cross-platform project with autoconf and do not worry about pursuing platform incompatibilities.
And autoconf dictates using recursive Make style. The other thing is that it is much easier to integrate components from different authors when each component' Makefile is independent of other components. The monolithic Make style requires following the same conventions in every (sub)project.
People might use some tricks for rebuilding just the subproject they need and do not waste 199s, so no one ever tried to fix that.
The other thing is that most people using Make aren't build experts and they do not know how to use it effectively. And those who know have other things to work on.

Where it really matters (in large enterprises) they do it the right way.

Unknown said...

BTW, really smart people in open source do not use autoconf and use hand-written Make rules. Check out e.g. nginx sources (http://nginx.org/en/).

Neil Mitchell said...

Maxim: thanks, that is really interesting information!