Non-Recursive Automake Performance

Date: July 30th, 2009 by Author: Daniel Elstner

A few days ago, Murray blogged about Glom’s switch to non-recursive Automake. The topic attracted more feedback than I had expected, and I have been asked if I could give a more detailed account of the speed improvements mentioned by Murray.

For the background, I highly recommend to read the original paper by Peter Miller, Recursive Make Considered Harmful. Here’s an appetizer:

For large UNIX projects, the traditional method of building the project is to use recursive make. On some projects, this results in build times which are unacceptably large, when all you want to do is change one file. In examining the source of the overly long build times, it became evident that a number of apparently unrelated problems combine to produce the delay, but on analysis all have the same root cause.

First off, here are the results for the Glom build, measured before and after the switchover:

make -j2 (clean tree)
337.0 seconds
recursive make
274.4 seconds
non-recursive make
make -j2 (nothing to be done)
4.7 seconds
recursive make
3.5 seconds
non-recursive make
make install
9.6 seconds
recursive make
7.0 seconds
non-recursive make
make clean
3.3 seconds
recursive make
2.9 seconds
non-recursive make

These results were obtained on a Lenovo desktop PC with an Intel Core 2 Duo E7200 at 2.53 GHz and 3 GiB RAM. The operating system is Ubuntu 9.04 (jaunty) for AMD64, installed on an ext4 volume of an encrypted LVM partition. At the time of the measurements, the full GNOME desktop interface was up and running, but otherwise idle. The entire measurement was repeated once with nearly identical results, and the timings presented here are the averages of the two runs.

As can be seen, the switch to non-recursive make reduced the time of a full Glom build by about 19 % on this machine, or more than one minute, which I think is quite decent. However, before you start to convert your own module, be aware that the performance gain you’ll see depends heavily on the specifics of your project. It’s practically guaranteed that you’ll see some improvement, but it could be a meager 0.1 % speedup. However, 40 % are just as possible. To help people gauge the possible effect on their project, I’ll try to outline the factors which I found to matter most in my experience.

Essentially, performance is improved because of two things: First, the time for make to get going and resolve dependencies includes a constant part, which introduces a significant delay even if there is nothing to rebuild. The second problem is that even with parallel make, subdirectories are normally entered in sequence, in order to prevent the number of parallel processes from increasing exponentially. A subdirectory cannot be left until all its targets have finished building. Unless the execution times of all parallel build processes overlap perfectly, one or more processor cores will idle. This effect will be more pronounced if the average number of source files per subdirectory is small. In the pathological case of one file per directory, there will be no parallelism at all. A classic example is the directory hierarchy of source code examples found in many library modules.

The programming language used also plays a role. On average, the idle periods due to imperfect overlap of parallel compilation processes will be longer if compilation is slow. Typically, a C++ source file will require substantially more time to compile than a C source file. The same holds for large source files versus small source files. The optimization level and other compiler settings also influence the compile time per object. With non-recursive make, the stalls due to imperfect overlap will only occur for targets which depend on multiple intermediate targets, such as when linking object files into a library. Even when make needs to wait for a dependency before it can continue with a target, it will still continue building any independent targets. Thus, the builds of multiple independent executables or libraries overlap, thereby hiding the stalls.

With all that mind, I think the 19 % speedup in the case of Glom should be considered above average. The directory hierarchy is deeply nested, with a small number of source files per subdirectory, although not as small as to be pathological. One important factor is that Glom used to build an intermediate static library in each subdirectory, which were later linked to produce the actual target binary. The new non-recursive build system omits these intermediate libraries and simply links all object files from all subdirectories in one go to produce the final target binary.

Automake has included support for non-recursive make for quite a while now, and it turns out to be surprisingly straightforward and easy to use. All you need to get going is AM_PROG_CC_C_O in your configure.ac for C compiler support, and AUTOMAKE_OPTIONS = subdir-objects in your Makefile.am. To refer to files in a subdirectory, just include the relative directory prefix. Automake variable names also include the directory prefix of the file names they are derived from. The *.am files in the Glom repository may be helpful as real-world examples.

Happy make -j∞ !

Maemo on AMD64, one chroot less

Date: February 18th, 2009 by Author: Daniel Elstner

More than two years ago I created a small how-to on how to get Maemo on AMD64 working. The solution was a fragile nesting of one chroot (Scratchbox itself) inside a 32-bit install of Linux running in another chroot environment.

Now, given the ubiquity of 64-bit capable desktop computers these days, it’s a shame that a document like this is still needed at all. However, there are some good news, too. On a hunch, I recently tried to forcibly install the i386 Scratchbox packages on my Ubuntu 8.10 AMD64 system. Well, what shall I say — it actually worked!

Thus, everyone who has used my how-to in the past to get this beast running might want to have another look. Without all the chroot madness, it’s now drastically shorter and easier to follow. Enjoy!

What to do with your sawn-off number pad

Date: January 21st, 2009 by Author: Daniel Elstner

Are you one of those people who sawed off their keyboard’s numeric keypad? Don’t throw away that keypad just yet! There might still be a use for it as a nice birthday present for your SMS-crazed teenage sister, so she can type on her computer using the same awful interface as on her mobile phone!

It’s easy: After you have rewired the numpad using components from some spare keyboard, and secretly installed Linux on your sister’s computer, all that is left for you to do is to check out and install GTK+ trunk. Then just set the default GTK+ input method to Multipress, which has recently gone through a code overhaul. Apart from cutting the code to about two thirds the original size and using more efficient data structures, the new code now actually works, too! That is, it works like it was intended to on other keyboards than just the single one it was originally written for — i.e. it makes it possible to use a numpad to type text, SMS-style like on a mobile phone.

So, next time you decide to saw through your keyboard to save space — how about keeping the smaller end instead of the larger one?

Per-widget GTK+ Input Method

Date: January 14th, 2009 by Author: Daniel Elstner

It used to be the case that in order to change the input method used by GTK+ applications, you had two options: Change the global default input method by editing some configuration file. Or use the Input Method context menu entry of some text widget every time you want to use a different input method. And even though it’s in a context menu, until recently the setting applied to all text input widgets of an application.

Last month, an xsetting was added to GTK+ which allows one to change the input method on the fly desktop-wide (bug #502446). And yesterday, a patch landed in GTK+ trunk which adds an "im-module" property to both GtkEntry and GtkTextView (bug #450716), in order to allow applications to assign non-default input methods to selected widgets.

Although not generally useful, this new capability can sometimes come in really handy as demonstrated by this tiny test case. As can be seen in the screen shot, the second entry is preset for the input of IPA symbols.
Input method switch test

Cluttermm Tutorial

Date: January 7th, 2009 by Author: Daniel Elstner

C++ programmers rejoice! I finally finished converting the Clutter tutorial written by Murray Cumming to C++ and the Cluttermm API. From the introduction of the new Cluttermm tutorial:

Cluttermm is a language binding for C++ on top of Clutter. It has the same functionality and concepts as plain Clutter, but provides C++ programmers with an interface that makes use of language features and common concepts of C++, such as static type safety, class inheritance and (optionally) exception handling.

Both the C tutorial and the new C++ one have been updated for the Clutter 0.9 API currently in development, which will eventually become 1.0. Therefore, this tutorial isn’t final yet and some of the examples still have quirks, but they all build fine. Right now, it only works with Cluttermm trunk, but a development release of Cluttermm 0.9 can be expected soon.

« Previous Entries Next Entries »