Non-Recursive Automake Performance

Date: July 30th, 2009 by Author: Daniel Elstner

A few days ago, Murray blogged about Glom’s switch to non-recursive Automake. The topic attracted more feedback than I had expected, and I have been asked if I could give a more detailed account of the speed improvements mentioned by Murray.

For the background, I highly recommend to read the original paper by Peter Miller, Recursive Make Considered Harmful. Here’s an appetizer:

For large UNIX projects, the traditional method of building the project is to use recursive make. On some projects, this results in build times which are unacceptably large, when all you want to do is change one file. In examining the source of the overly long build times, it became evident that a number of apparently unrelated problems combine to produce the delay, but on analysis all have the same root cause.

First off, here are the results for the Glom build, measured before and after the switchover:

make -j2 (clean tree)
337.0 seconds
recursive make
274.4 seconds
non-recursive make
make -j2 (nothing to be done)
4.7 seconds
recursive make
3.5 seconds
non-recursive make
make install
9.6 seconds
recursive make
7.0 seconds
non-recursive make
make clean
3.3 seconds
recursive make
2.9 seconds
non-recursive make

These results were obtained on a Lenovo desktop PC with an Intel Core 2 Duo E7200 at 2.53 GHz and 3 GiB RAM. The operating system is Ubuntu 9.04 (jaunty) for AMD64, installed on an ext4 volume of an encrypted LVM partition. At the time of the measurements, the full GNOME desktop interface was up and running, but otherwise idle. The entire measurement was repeated once with nearly identical results, and the timings presented here are the averages of the two runs.

As can be seen, the switch to non-recursive make reduced the time of a full Glom build by about 19 % on this machine, or more than one minute, which I think is quite decent. However, before you start to convert your own module, be aware that the performance gain you’ll see depends heavily on the specifics of your project. It’s practically guaranteed that you’ll see some improvement, but it could be a meager 0.1 % speedup. However, 40 % are just as possible. To help people gauge the possible effect on their project, I’ll try to outline the factors which I found to matter most in my experience.

Essentially, performance is improved because of two things: First, the time for make to get going and resolve dependencies includes a constant part, which introduces a significant delay even if there is nothing to rebuild. The second problem is that even with parallel make, subdirectories are normally entered in sequence, in order to prevent the number of parallel processes from increasing exponentially. A subdirectory cannot be left until all its targets have finished building. Unless the execution times of all parallel build processes overlap perfectly, one or more processor cores will idle. This effect will be more pronounced if the average number of source files per subdirectory is small. In the pathological case of one file per directory, there will be no parallelism at all. A classic example is the directory hierarchy of source code examples found in many library modules.

The programming language used also plays a role. On average, the idle periods due to imperfect overlap of parallel compilation processes will be longer if compilation is slow. Typically, a C++ source file will require substantially more time to compile than a C source file. The same holds for large source files versus small source files. The optimization level and other compiler settings also influence the compile time per object. With non-recursive make, the stalls due to imperfect overlap will only occur for targets which depend on multiple intermediate targets, such as when linking object files into a library. Even when make needs to wait for a dependency before it can continue with a target, it will still continue building any independent targets. Thus, the builds of multiple independent executables or libraries overlap, thereby hiding the stalls.

With all that mind, I think the 19 % speedup in the case of Glom should be considered above average. The directory hierarchy is deeply nested, with a small number of source files per subdirectory, although not as small as to be pathological. One important factor is that Glom used to build an intermediate static library in each subdirectory, which were later linked to produce the actual target binary. The new non-recursive build system omits these intermediate libraries and simply links all object files from all subdirectories in one go to produce the final target binary.

Automake has included support for non-recursive make for quite a while now, and it turns out to be surprisingly straightforward and easy to use. All you need to get going is AM_PROG_CC_C_O in your configure.ac for C compiler support, and AUTOMAKE_OPTIONS = subdir-objects in your Makefile.am. To refer to files in a subdirectory, just include the relative directory prefix. Automake variable names also include the directory prefix of the file names they are derived from. The *.am files in the Glom repository may be helpful as real-world examples.

Happy make -j∞ !

7 Responses

  1. Javier Jardón says:

    July 30th, 2009 at 21:30

    Thank you for the benchamrks,

    I’ve just added a link to this entry in http://live.gnome.org/JavierJardon/NewBuildSystem (Feel free to modify the page if you want ;) )

    Regards

  2. Stephen Gentle says:

    July 31st, 2009 at 02:33

    That’s a very nice performance improvement. I’m really sold on non-recursive automake – I’ve tried Cmake, which I didn’t like because it has some pretty dodgy syntax and it can’t do things like pkg-config without including extention scripts. And I’ve stayed away from Scons since I’ve not once been able to get a project using it to compile (it always seems to complain about errors in the SConstruct file, probably because everyone seems to set the minimum supported version to one higher than the one that is shipped with Ubuntu).

    Automake just works, and if you do it non-recursively, it works well – I think that really the main reason people don’t like it and switch to other solutions is that there just aren’t enough examples or tutorials about how to use it effectively.

    One other thing that I’m interested in playing with that could speed up compiling (and the performance of the application) might be to use llvm and Clang instead of GCC.

  3. Magnus says:

    July 31st, 2009 at 08:27

    Would you mind adding timing of running make a second time, i.e. what is the time difference for finding out that nothing needs to be done?

  4. Daniel Elstner says:

    July 31st, 2009 at 17:32

    Thanks for your comments. Javier, your comment got caught in the spam queue — sorry about that.

    I have now added the missing numbers for running make -j2 a second time with nothing to be done. I did several runs with a hot cache and averaged the results. Note that po and docs/user-guide directories are still entered recursively even with the new build setup.

  5. Popa Adrian Marius (mariuz) 's status on Tuesday, 13-Oct-09 11:57:49 UTC - Identi.ca says:

    October 13th, 2009 at 13:57

    [...] QotD:"Happy make -j∞ !" via non-recursive automake performance http://danielkitta.org/blog/2009/07/30/non-recursive-automake-performance/ [...]

  6. Non-recursive automake — drboblog says:

    December 29th, 2009 at 22:34

    [...] was largely possible due to the (fairly) recent literature on the subject from Murray Cumming and Daniel Elstner, which proved rather useful; so many thanks to [...]

  7. Philip Withnall: Non-recursive automake | Full-Linux.com says:

    December 31st, 2009 at 08:11

    [...] was largely possible due to the (fairly) recent literature on the subject from Murray Cumming and Daniel Elstner, which proved rather useful; so many thanks to [...]