Apache 2.0

Reuven M. Lerner

Issue #99, July 2002

Reuven discusses the significance of the 2.0 release for web developers, administrators and the Open Source community.

As I write this, Apache 2.0 has been out in stable form for nearly a month—and from everything I can tell, it's definitely ready for prime time. While there are other open-source HTTP servers, Apache is definitely the best known and best supported. Apache is used on 60% of the web sites in the world, comes with virtually every Linux distribution and is even part of several commercial application servers. Both Zope and Jakarta-Tomcat have their own built-in HTTP servers, but almost no one exposes these servers directly to the Web. Rather, they use Apache as a front end because of its combination of performance and flexibility. This month, we take a closer look at Apache 2.0 [see also “Apache 2.0: the Internals of the New, Improved 'A PatCHy”', available at www.linuxjournal.com/article/4559].

Architecture

If you are familiar with Apache 1.x, then very few things in Apache 2.0 will surprise you. For starters, Apache continues to be highly modularized, allowing you to include only those modules that you deem necessary in your server. But whereas Apache 1.3 had a core module that included the basic HTTP implementation, Apache 2.0 has delegated even more supported protocols to modules. This has a number of advantages, including the fact that we can now add (and subtract) protocols as necessary from Apache. In other words, Apache has now become a general-purpose internet server, rather than just an HTTP server. How many projects will take advantage of this functionality remains to be seen.

Apache was never meant to be the fastest server on the planet. Rather, it was designed to be extensible via a system of modules. Each module provided a different piece of functionality; administrators interested in squeezing the last ounce of power from their systems don't have to include irrelevant modules. For example, if we know that our server will never run any CGI programs, then we can easily remove mod_cgi, gaining some CPU cycles and memory in the process.

Apache 2.0 continues in the long-standing Apache tradition of handling each HTTP transaction in a number of named phases. A module may examine or modify the transaction during any one of these phases by attaching its own handler to the appropriate hook. For example, mod_speling (which corrects capitalization and spelling mistakes in URLs—the name is purposely misspelled) attaches its handler to the “fixup” phase hook, executing immediately before the server generates a response.

In Apache 1.x, only one handler could fire for a given hook. In Apache 2.0, each handler not only registers itself for a given hook, but indicates when it would like to execute relative to other modules; mod_speling, for example, registers its handler as the final (APR_HOOK_LAST). If another module were to register with the fixup handler, it would execute before mod_speling. The fact that multiple handlers can fire for a given hook opens a world of possibilities that were previously too difficult to achieve.

On a similar note, Apache now makes it possible for one module to filter, or modify, the output of another module. This is currently possible with mod_backhand, but that module depends on a number of tricks and dark corners in the Apache API. Apache 2.0 is designed to allow modules to act as input or output filters. This means that if you want to add a standard set of headers or footers to your HTML pages, you can now do this across the board, including for dynamically generated pages created by CGI programs, server-side includes and mod_perl handlers.

The Apache configuration system now uses GNU autoconf rather than the Apache-specific system that was in use for versions 1.x. And, many of the C-language abstractions (such as hash tables and strings) that were included in previous versions of Apache have now been named the Apache Portable Runtime (APR). The APR is included with Apache and is configured and compiled into the server automatically when you build it.

Finally, Apache now comes with mod_ssl, which provides SSL and TLS encryption. Not only did Apache 1.x fail to come with such a module, but the two modules (Apache-SSL and mod_ssl) were incompatible and required patching the Apache source code before installation. The fact that mod_ssl will now be a standard part of every Apache installation is a huge relief for web site administrators and is most welcome.

MPMs

UNIX systems have long had the ability to run multiple processes simultaneously. I typically run Emacs, a GNOME terminal and Galeon on my Linux box; while a casual glance might only reveal these three processes, there are actually dozens more (sendmail, gnome-panel, Apache, syslogd and the like) that are executing without my direct knowledge. For a complete list of what is running on my computer, I can use the command ps aux.

The good news is that the process model is simple to understand, ensures stability on the system and is portable across many operating systems. Unfortunately, however, processes are relatively heavy and slow. Linux users are especially spoiled on this front because creating a new process on Linux is a surprisingly lightweight operation. But even on Linux, spawning a new process can sometimes be a bit extreme.

For this reason, an alternative model of threads has grown over the years. Using threads, a single process can be executing in multiple places at the same time. Threads offer many of the benefits of processes without the overhead. But there is a cost: programming with threads can be extremely tricky because it's always possible that a particular piece of code is executing in two different threads. You can always write (or rewrite) code to be threadsafe, but this is often a difficult task.

Because threads were both difficult and tricky to handle, and because Apache was originally designed to work only on UNIX machines, Apache 1.x worked exclusively at the process level—if you want to handle ten simultaneous HTTP requests, then you must have ten Apache processes running. Because it takes time to create a new process, Apache 1.x took an idea from NCSA HTTPd, preforking processes before they are actually needed. This means that Apache can be a bit slow to start up, but that handling the incoming connections does not take much time. Apache also allows administrators to indicate how many “spare servers” should always exist, adding and removing Apache processes as necessary.

Preforked Apache servers are solid, well understood and robust. But on many systems, using processes is inferior to threads. In particular, Windows uses threads far more than processes, which means that by sticking with processes, Apache was limited in its ability to penetrate the Windows market.

Apache 2.0 solves these problems with MPMs (multiprocessing modules). Each MPM is an Apache module that handles the details of processes and threads. On Windows, OS/2 and BeOS, this means that you can finally run Apache using a threading mechanism that is native to your operating system. On UNIX and Linux systems, you can experiment with a number of different models, choosing one that is appropriate for your needs.

The prefork MPM, which runs in exactly the same way as Apache 1.x did, is the default choice when you install Apache. Two other choices for Linux users are: 1) worker: the number of threads rises and falls (according to the number of incoming requests), but the number of processes remains constant; and 2) perchild: each process contains a fixed number of threads, but the number of such processes rises and falls according to the number of incoming requests.

It's too early to tell, but I expect that more MPMs will emerge over time, and that there will be numerous modules that take advantage of threads to pool database connections, share application data and spawn asynchronous tasks in the background.

Configuring and Installing Apache

Retrieve the latest source code from httpd.apache.org/dist/httpd; the latest version as of this writing is 2.0.35. Unpack the source code in a temporary directory:

cd /tmp
tar zxvf httpd-2.0.35.tar.gz

You may now run the configure program with one or more arguments. These arguments fall into roughly four categories: 1) Into which directories should Apache be installed? 2) Which MPM do you want to use? 3) Under which user ID should CGI programs execute? 4) Which modules do you want to install? And of those, which should be installed dynamically (using shared libraries) rather than statically?

You can get a full list of configuration options by typing ./configure --help. This is particularly true if you want to include a module that isn't included by default. The biggest change in configuration is that modules now have their own options to activate them. For the simplest possible configuration that uses the “worker” MPM, type:

./configure --with-mpm=worker

Following this, run make, followed by make install. (There is no make test for Apache as of the time of this writing.) By default, Apache 2.0 is installed into /usr/local/apache2. You can start it using the same program as Apache 1.x, apachectl, which is normally in /usr/local/apache2/bin/:

/usr/local/apache2/bin/apachectl start
The server will soon start up. HTML documents will normally be kept in /usr/local/apache2/htdocs, so you should already be able to put HTML documents there and view them.

Apache's runtime configuration remains dependent on a text file, normally called httpd.conf. If you are familiar with Apache 1.x, then you will be happy to know that almost all of the existing directives will continue to work. The main directives that you will have to learn are those that pertain to threading, assuming that you use the worker or perchild MPMs.

Should I Switch?

When the Apache Software Foundation announced Apache 2.0, the announcement explicitly said that the new version is the most stable and recommended version for production use. And for the most part, I believe them; www.apache.org receives many more requests per day than my server, and they have been running Apache 2.0 beta versions for over a year. Thus, it's safe to say that Apache 2.0 is stable enough for most sites to use.

The main reason to avoid switching to Apache 2.0 at this point is if you need mod_perl or PHP; they are currently still in testing but will probably be available by the time you read this.

As I mentioned above, however, it's hard to get threading right, and this is particularly true in Perl, which has experimented with a number of threading models in the last few years. If you have compiled Perl with ithreads, then you can use it to create a mod_perl for Apache 2.0 that uses the worker or perchild MPMs. But just how stable this configuration will be remains to be seen; it may well be that mod_perl users will choose to stick with the prefork MPM for now, until the dust settles a bit.

Conclusion

Apache 2.0 comes with more of everything that a web developer would want—more modules, more flexibility and greater speed. If you haven't yet tried Apache 2.0, I encourage you to download it and test it with your site's configuration to verify that it will be a good choice.

Resources

email: reuven@lerner.co.il

Reuven M. Lerner is a consultant specializing in web/database applications and open-source software. His book, Core Perl, was published in January 2002 by Prentice Hall.