Three-Tiered Design

Reuven M. Lerner

Issue #81, January 2001

Learn how to implement middleware into the mod_perl/Apache design mix.

Several months ago, we took a long look at Mason, a Web development environment that combines mod_perl, Apache and templates. One of the examples was that of a system for press releases, in which Mason components retrieve the latest press releases from a database. The style of programming demonstrated in this article could be characterized as “two-tier”, in which the Mason components speak directly with a database, using Perl's DBI mod_perl's Apache::DBI.

But as several people on the Mason e-mail list noted, this approach—in which the SQL statements are directly inside of Mason components—is often unwise. Modifying the database definitions, or even the brand of database server we use, forces us to change the components themselves. In addition, non-web programs must re-implement SQL calls within their own programs, rather than reusing a commonly maintained and tested library.

We can solve both of these problems by adding an extra layer of software, sometimes known as “middleware”, between the database and the Mason components. This increasingly popular architecture is known as a “three-tiered” approach, because it means that we must work with three different sets of software services: the database, a “middleware” abstraction layer, and the implementation/presentation layer.

This month and next, we will look at a simple web-based address book and appointment calendar that demonstrate this three-tier approach. Along the way, I hope that you'll learn the advantages and disadvantages of this approach and be able to weigh it alongside other approaches when creating a web site. Once we have looked into this general architecture, we will be well prepared to look into Java Server Pages with Jakarta-Tomcat, and application servers. We'll examine the pitfalls involved with this approach, as well as how it can make development easier and more scalable in the long run.

The Database

The first tier, and perhaps the most significant, is the relational database. I will use PostgreSQL for this example, but it would not matter much if we used something like Oracle or MySQL.

Before we design our database, we must have a specification describing what we want to do. After all, the design of a database depends in no small part on how we intend to use it.

For the purposes of this article, we will keep the specification relatively short and ambiguous: I want to have an addressbook that I can view, search and update via the Web. In addition, I would like to be able to add, view and modify appointments using my web browser.

In order to accomplish this, we will need at least two tables, one containing people and the other containing appointments. Here is an initial stab at a definition of the people table (see Listing 1).

Listing 1. Defining the People Table

In other words, we will always store a person's first name, last name, country and e-mail address. Beyond that, we can store their address, city, state, postal code and some comments about them. This assumes that everyone we know has an e-mail address—an assumption that is increasingly true, but not necessarily a good attribute if you have friends and business contacts from outside of the computer industry.

Each of the entries in our people table will be uniquely identified by the person_id column, which is automatically incremented by PostgreSQL. In addition, we ensure that we only add each person once by checking for the uniqueness of their e-mail address. This allows us to have more than one friend named John Smith. It also means that we cannot store separate information about a couple with a shared e-mail address. There also is no good provision for handling people with multiple e-mail addresses.

Adding a new person into the people table is relatively easy:

INSERT INTO People
        (first_name, last_name, address1, address2, email,
        city, state, postal_code, country, comments)
    VALUES
        ('Shai', 'Re\'em', '10
Helmonit', 'Apt. 7',
        'shai@lerner.co.il',
'Modi\'in', NULL,
        71700, 'Israel', 'Six-year-old
nephew')
    ;

Because most of the columns are NULL, I can even get away with entering a bare minimum of columns:

INSERT INTO People
        (first_name, last_name, email, country)
VALUES
        ('Hadar', 'Re\'em',
'hadar@lerner.co.il',
        'Israel')
    ;
Now we will create an appointments table in which we will store appointments with members of the people table:
CREATE TABLE Appointments (
     person_id    INT4       NOT NULL   REFERENCES People,
     start_time   TIMESTAMP  NOT NULL,
     end_time     TIMESTAMP  NOT NULL,
     notes        TEXT       NULL       CHECK
                (notes <>''),
        UNIQUE(start_time)
    );
Once I have the appointments table defined, I can add a new appointment by inserting a new row into appointments:
INSERT INTO Appointments
        (person_id, start_time, end_time, notes)
    VALUES
        (1, 'November 22, 2000 19:00',
        'November 22, 2000 19:30', 'Read Dr. Seuss')
    ;
But because person_id is defined so as to be a foreign key from people, we can only add an appointment if we are meeting with someone already in the people table. This might be adequate for our purposes, but a more sophisticated (and well-specified) system would presumably give us more flexibility. And of course, this database will not let me indicate that I am meeting with more than one person at a time.

Middleware

Now that we have created our initial database design, we will consider the design of our middleware layer, which insulates the web application from the database. If we ever decide to switch to another brand of database server, or even replace the database with flat ASCII files or DBM files, the object layer will remain the same.

In addition, non-web applications will be able to use this layer in order to access the database, making it possible, for instance, to write a set of routines that export our appointment calendar into XML, or to import it from another program.

This middle tier is often called the “business logic” of an application. The database makes it easy and safe for us to store and retrieve information, and the Mason components make it easy for us to create dynamic output for the end user. The middleware layer will try to force the database to do as much of the computation as possible, using built-in functions, views and stored procedures. But the actual logic that determines our application's functionality will reside in the middle tier.

Perl gives us at least two options when creating this layer. One possibility is to create a basic Perl module that provides subroutines and variables that can accomplish the tasks we need. Such a procedural interface is relatively easy to write and executes at the same speed as all other Perl subroutines.

But Perl also offers us the option of creating an object module. While Perl objects are slightly harder to write, and their methods execute more slowly than straight subroutines, they make it easier to conceptualize and write programs.

Before we can create our middleware layer, we have to answer some serious questions. What sorts of objects will we create? We could create a single database object that handles all of our queries for us, turning them into the appropriate SQL. But we will occasionally want to retrieve information about people without any regard for appointments, which means that we should have, if nothing else, a people object, as well as a separate appointments object. Because our database table definition forces us to associate each appointment with one person, we can only define our appointments object after the people object.

People.pm

Listing 2 contains a listing for People.pm, an object module that performs some basic tasks for the people table that we created earlier. The object is not complete and has some rough edges, but should suffice for demonstrating how to access a relational database via an object middleware layer.

Listing 2. People.pm, a Perl Object Module That Communicates with the Package People

The basic idea is that you create a new instance of people, and then manipulate the people in your appointment book with that object. To retrieve all of the names of people in the database, you can use the get_all_full_names method, as in this code fragment (also see Listing 3):

use People;
# Create a People object
my $people = new People;
# Retrieve all of the full names
my @names = $people->get_all_full_names();
# Print the names
foreach my $name (@names)
{
    print "name\n";
}

Listing 3. Retrieve-people.pl, a Program That Uses People.pm to Retrieve Information from the Database

To set or retrieve information about a particular person, you must first identify which person you're talking about. Because our middleware layer is meant to shield the user from having to think or worry about primary keys and other database-specific IDs, we will allow them to set the “current person” via either the first and last name, or via the e-mail address.

The e-mail address is guaranteed to be unique in the database layer, and thus using set_current_person_by_email is the most reliable method available. Nevertheless, it's often useful to identify people by first and last name, so we also offer the set_current_person_by_name method. In the current implementation, using the name to set the current person will match the first returned row from the database, which might not necessarily be what you want.

Once a program has set the current person, it can retrieve information about that person using the get_current_info method:

# Set the current person by name
$people->set_current_person_by_name("Shai","Re'em")
|| die "Error retrieving person.";
# Print the information
foreach my $key (sort keys %{$info})
{
        if (defined $info->{$key})
        {
                print "$key => $info->{$key}\n";
        }
}

Each instance of people will keep only two pieces of state: the ID of the currently selected person ($self->{current_person}) and the database handle ($dbh) that connects us to the database ($self->{dbh}). We keep the database handle around because connecting to a database is a relatively expensive operation. We can thus save ourselves some time by connecting to the database in the constructor, and then using that connection each time we invoke a method on this object.

Of course, this means that the database connection will have to be destroyed when the Perl object goes away—a somewhat tricky task, given that Perl does not have explicit destructors, since it is a garbage-collected language. The solution is to create a method called DESTROY, which is invoked by Perl whenever the object is destroyed. Our DESTROY method merely closes our connection to the database, allowing the object to be removed without potentially causing a memory leak in either the database client or server:

sub DESTROY
{
        # Get myself
        my $self = shift;
        # Get the database handle
        my $dbh = $self->{dbh};
        # Close the database handle
        $dbh->disconnect;
        return;
}

We can even create a new person, invoking the new_person method with a set of hash keys and values as arguments. These are then translated by the middleware layer into an appropriate SQL query:

# Now insert a new person
my $success = $people->new_person
        (first_name => "Reuven",
        last_name => "Lerner",
        country => "Israel"
        email => 'reuven@lerner.co.il',
        phone => '08-973-2225');
print "Inserted successfully" if $success;
Because Perl's undefined (“undef”) value is automatically translated into an SQL “NULL” value, the optional columns will be filled in with NULLs, as should be the case.

Appointments

Now that we have a class that handles the people in our database, we need to create an appointments class. For now, we will only concern ourselves with inserting new appointments and retrieving today's appointments.

The design of Appointments.pm (see Listing 4) is generally similar to People.pm, particularly in the way that it opens a database connection in the constructor and closes it in the automatically invoked DESTROY method. Beyond this, however, appointments keeps no state whatsoever. It merely acts as a conduit to the database, allowing us to create new appointments and find out with whom we are meeting today.

Listing 4. Appointments.p

For example, Listing 5 contains a short program that uses Appointments.pm to create a new appointment. We must create an instance of people and another of appointments. Once we have these two objects, we can set the “current person” to be my niece (“Hadar Re'em”), dying with an error if set_current_person_by_name returns undef (indicating failure).

Listing 5. Insert-appointment.pl

Once we have successfully set the current person, we can create an appointment with that person. The format of the date and time are dictated by PostgreSQL, which accepts a variety of formats.

We can similarly retrieve today's appointments using the program in Listing 6 (print-appointments.pl). This program uses the get_today method, which returns a list of hash references. Note that the implementation of get_today uses DBI's fetchrow_hashref method, which is known to be significantly slower than fetchrow_arrayref. However, it makes life much more convenient, allowing us to do print-appointments as seen in Listing 6.

Listing 6. Print-appointments.pl

Finally, we can list all of today's appointments with a particular person with the get_today_with_person method. Of course, this means that we must create an instance of people and choose a current person using one of the methods described earlier. The implementation of get_today_with_person expects to receive an instance of people as its first user-passed parameter, allowing us to use the current person in our SQL query. The program in Listing 7 demonstrates how I can find all of today's appointments that I have with my nephew Shai.

Listing 7. Print-appointments-with-shai.pl

Designing the Objects

One of the major points of using objects in a middleware layer is the fact that they provide a layer of abstraction. So long as the interface is well defined and remains stable, the implementation can change.

However, as with all programming techniques, designing good objects can be difficult. Perl provides totally open access to an object's internals, meaning that without a good API, programmers using the object might be tempted to reach inside and work directly with the implementation. This might mean that software will break when the implementation changes—the very situation that using objects was supposed to prevent!

In addition, we want our objects' implementations to be relatively separate from each other. During my design of the people and appointments objects, I was sorely tempted to allow Appointments to get and use the numeric ID of the current person. But of course, doing so would violate the abstraction barrier that I created with my object. The solution, which is admittedly not as elegant as I would like, was to create the get_current_person method. This allows appointments to retrieve the current user, without having to know where it comes from. In the end, of course, the return value from get_current_person is placed in an SQL statement and is compared with People.person_id, breaking the abstraction somewhat.

Finally, notice how each of the objects here contains basic logic, but does not store any state. It would be relatively simple, for example, for our people object to retrieve all of the rows from the people table, and to make them available to invoking objects from within Perl. Indeed, such a solution would significantly reduce the overhead of going to a database, and would allow us to perform manipulations in Perl, rather than turning to SQL each time.

But this solution causes many more problems than it solves. For example, what happens if we create two instances of people? Now we have two objects, each of which contains the full set of rows from the people table. If one object modifies its state, that modification will never be reflected in the second object. Worse yet, what happens if both objects modify their state before storing those changes in the database? Perhaps the database is designed to resolve such locking issues, but our Perl objects are not. Furthermore, what happens when we have 100,000 people in our people table? Reading that much data into a database client is a waste of memory, and of the high-performance data selection and manipulation routines that a database server includes.

Our objects are thus pipelines to the database, giving our web application the ability to talk to a database without having to include any SQL or knowledge of the tables' layout. The objects, by providing a standard API, make it possible to change the underlying implementation without having to announce those changes to the world.

Conclusion

This month, we looked at the reasons behind three-tiered architectures and started to look into a basic implementation of an application that uses this architecture. As you can see, we can already create small text-mode applications. Next month, we will complete our implementation, giving us the skeleton for a simple appointment book using mod_perl/Mason and PostgreSQL. We will also discuss issues regarding the scalability of three-tier solutions, as well as some of the pitfalls.

Resources

Reuven M. Lerner owns and manages a small consulting firm specializing in Web and Internet technologies. As you read this, he should (finally!) be finishing Core Perl, to be published by Prentice-Hall later this year. You can reach him at reuven@lerner.co.il, or at the ATF home page, http://www.lerner.co.il/atf/.