This Perl-based web helper and MySQL work together to let you quickly build a user registration system for your web site.
Over the last two months, we looked at the Perl-based web development known as Mason. Mason, written and maintained by Jonathan Swartz, is based on the idea of “components”, flexible templates that can contain HTML, Perl code, or a bit of each. Mason makes it possible to create large, dynamic web sites in a relatively short period of time. Moreover, it lends itself to easy and extensive code reuse, removing many of the common maintenance issues associated with web sites.
Because Mason is traditionally run on top of mod_perl, an Apache module that places a complete Perl binary inside of the web server, it can take advantage of other Perl modules developed for mod_perl. One particularly useful example of such a module is Apache::Session, which makes it possible to get around some of the problems associated with HTTP's statelessness.
This month, we will look at Apache::Session, using it to create a simple user registration system based on Mason. This system can make it relatively simple to create a personalized web site, connecting information in a relational database with a particular user.
HTTP was designed to be a lightweight protocol, with each transaction taking a minimum amount of time. As a result, it is fairly minimalist, with each connection consisting of a single request-response pair. (Modern versions of HTTP support multiple request-response pairs within a single transaction, but my impression is that the single-transaction version 1.0 is still the norm.)
In this model, the HTTP client connects to the server, sends a request and an optional parameter, and then one or more headers that describe the browser's capabilities. The HTTP server then returns one or more headers describing the response, followed by the response itself. The response can be an HTML-formatted text document, an image, or an error message indicating that the request could not be fulfilled. After the server sends this response, it closes the connection.
Because each HTTP transaction takes place in a vacuum, without any information from other transactions, it is difficult to keep track of a user's actions. The web has no sense of “logging in” or “logging out”, unlike a traditional computer environment. It is impossible to know whether five HTTP requests were initiated by five separate users on the same computer, or one user interested in five different URLs.
Two main techniques get around these problems. The first, called “cookies”, allows the server to store a name-value pair on the user's computer. The cookie is set in a “Set-Cookie” header at the beginning of the server's HTTP response. Every time the browser returns to a site within this server's domain, it sends a “Cookie” header as part of the request, with the name-value pair that was previously stored. Cookies are limited in length, can be deleted by a browser at any time and can easily be inspected and modified by a user.
Another technique, which we will not explore this month, involves the use of a URL's “path_info” segment. For example, consider the URL www.example.com/cgi-bin/foo.pl/abc/def. If /cgi-bin/foo.pl exists on the server, then /abc/def is passed as an additional argument that exists separately from any name-value pairs submitted from the client.
While neither cookies nor path_info is a perfect solution for the issue of state on the Web, they are sufficient for most needs. However, these solutions address only the problems with HTTP; they don't provide a means for giving our programs a sense of state.
Apache::Session bridges this gap, making it possible to associate arbitrary information along with a user. (We will soon discover that things are not quite this simple, but the overall principle is sound.) Apache::Session, which is available from CPAN, works with either cookies or path_info, and can store information using mechanisms ranging from ASCII files to relational databases. It is designed to work with mod_perl, and thus works with Mason; the documentation indicates that Apache::Session should also work under CGI, although I have not tested this claim.
Because of their versatility and speed, and because Apache::Session works best when associated with additional information in a relational database, we will use MySQL for our back end, called the “object store” in the module's documentation. In order to do this, we will need to create a table named “sessions” in our database, which looks something like this:
CREATE TABLE sessions ( id CHAR(16), length INT(11), a_session TEXT );
Apache::Session requires the table to be named sessions and that it contain three columns: an id column of type CHAR(16), a length column of type INT(11) and an a_session column of type TEXT or BLOB, which can contain any amount of binary data.
Each unique session is identified by a unique 16-character string, stored in the id column. The actual session data is stored in the a_session column, in the “nfreeze” format defined by the Storable module. (Storable is also available from CPAN.)
Each time a user's browser sends an HTTP request to the web server, it sends whatever cookies have been stored by that domain. So if a cookie was set by cnn.com, my browser will return only that cookie—which is, after all, simply a name-value pair—when I visit http://www.cnn.com/ again.
The cookie version of Apache::Storable takes advantage of this by storing a unique identifier in a cookie. This unique identifier corresponds to the id column in the sessions table. This allows us to retrieve any data that have been stored in a_session. Because a_session is defined to be infinitely long, the amount of data we can store is limited only by our database and our file system.
Data stored in table sessions by Apache::Session is available to programs via the global %session hash. %session is created anew for each incoming HTTP request, and refers to only the data stored in a_session. Storing something in %session places it in the a_session column, and retrieving something from %session gets the value from a_session. Assuming that the variables $first_name, $last_name and $email contain the appropriate pieces of information, we could store them reliably with the following lines of Perl:
$session{first_name} = $first_name; $session{last_name} = $last_name; $session{email} = $email;
Since each user (actually, each session) is stored in a separate row of the database, we do not need to worry about users clashing with each other.
In order for sessions to work, we must make a connection between the Apache::Session::DBI module and the corresponding sessions table on disk. This connection must take into account three different possibilities: (a) that the user sends us a valid ID cookie, (b) that the user sends us an invalid ID cookie, and (c) that the user sends us no ID cookie at all.
The first case is the easiest; the program merely needs to re-establish the connection between %session and the appropriate row in sessions, using Perl's “tie” mechanism. In the second case, the program must create a new session if it could not re-establish a previous one. And if the user sends no cookie at all, then we must create a new row in sessions, attach a unique ID to it and send that unique ID to the user's browser in the form of a cookie.
When working with Mason, we put all this in our start-up file. This file, which the Mason documentation calls handler.pl (but which I prefer to call mason.pl), defines all of Mason's main behaviors and allows us to define global variables that other elements of the system will require. Defining %session in mason.pl also ensures that it is available in all Mason components. See Listing 1 for a simple example of mason.pl for a site that wants to include sessions. (Much of Listing 1 comes straight from the Mason documentation.)
The most important part of this file is a call to Perl's eval command. eval comes in two forms, one of which takes a code block as an argument, and forms as a primitive form of error-checking. Inside our code block, we attempt to use Perl's tie command to connect the hash %HTML::Mason::Commands::session to the Apache::Session::DBI module. Tying these two together means that the default storage and retrieval mechanism associated with hashes no longer applies for %session—when we retrieve or modify its value, one or more methods in Apache::Session::DBI will take over:
eval { tie %HTML::Mason::Commands::session, 'Apache::Session::DBI', ($cookies{'AF_SID'} ? $cookies{'AF_SID'}->value() : undef), { DataSource => $dbsource, UserName => $dbuser, Password => $dbpass }; };
If this eval is unsuccessful, the variable $@ will contain the error message. Here, we test to see if the object exists in the data store. If so, then we assign the user a new session:
if ( $@ ) { if ( $@ =~ m#^Object does not exist in the data store# ) { tie %HTML::Mason::Commands::session, 'Apache::Session::DBI', undef, { DataSource => $dbsource, UserName => $dbuser, Password => $dbpass }; undef $cookies{'AF_SID'}; } }Finally, if the user does not pass us any identifying AF_SID cookie at all, we create a new one and tell mod_perl to send it along with the rest of the outgoing headers:
if ( !$cookies{'AF_SID'} ) { my $cookie = new CGI::Cookie(-name => 'AF_SID', -value => $HTML::Mason::Commands::session{_session_id}, -path => '/',); $r->header_out('Set-Cookie', => $cookie); }Once these are in place, any Mason component can store and retrieve information in %session. Apache::Session's use of the Storable module means that references and complex data structures (such as arrays of arrays, and hashes of hashes) can be stored in %session without us having to worry about losing data.
Just because we can store anything in %session does not mean we necessarily should. For instance, a site that wants to keep track of users' names and e-mail addresses could potentially store this information in %session. While doing so makes the information readily available from within Mason components, it creates other problems. For instance, it would be difficult to retrieve the rows of “sessions” and use them to create a mass mailing to subscribers' e-mail addresses.
For this reason, I generally use Apache::Session to store only one value, the primary key associated with the user's row in a Users table. (There are other ways to accomplish the same task, such as including the user's unique 16-character ID field in the Users table and adding a “UNIQUE” constraint on it.) If $session{user_id} exists, then we can assume the user has previously registered, and use that value to retrieve other information from Users. If $session{user_id} does not exist, then we assume the user is new to our system.
Here is one possible definition for a Users table which we can use in this way:
CREATE TABLE Users ( user_id MEDIUMINT AUTO_INCREMENT, username VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL, password VARCHAR(20) NOT NULL, password_hint VARCHAR(60) NOT NULL, PRIMARY KEY(user_id), UNIQUE(username), UNIQUE(email) );
We define all of the columns in this database as NOT NULL, meaning that they are mandatory fields. Aside from the user's unique ID (which is automatically generated by MySQL), user name and e-mail address, we require a password and a password hint. As we will see, these will allow us to create a full login system, and to handle some of the problems associated with HTTP cookies.
Now that we have defined a Users table, it is time to define some Mason components. Some of these components will be similar to subroutine, and others will be similar to HTML fragments. As we saw last month, both are acceptable (and welcome) types of Mason components. I typically use an .html suffix on top-level components that are visible to the user, and a .comp suffix on others—but you may wish to set up your own conventions.
Before we do anything else, we will need a component that allows us to connect to the database, and to retrieve a database handle (traditionally known as $dbh). Because Mason typically runs under mod_perl, we will take advantage of the Apache::DBI module, which keeps a database connection open even after an HTTP request has been served. Reusing database connections in this way dramatically increases the speed of our application, since logging in to a database can be relatively slow.
Listing 2 contains a simple Mason component that connects to the database and returns a valid $dbh. By putting this functionality inside one component, we avoid having to include that code inside every other component on the site. Moreover, it means that if we have to modify the data source name (“DSN” in Perl lingo), we can do so by changing one file.
Notice how database-connect consists solely of <%perl> and <%once> sections, without any HTML. This is an example of a component that acts purely as a Perl subroutine, returning a value to its caller. By contrast, Listing 3 contains register-form.html, a top-level component that contains only a few lines of Perl. The majority of register-form.html is straight HTML, and can be written by a graphic designer, rather than a programmer.
Registering is a relatively straightforward process. Information typed into register-form.html is sent to register.html (see Listing 4). The latter retrieves the name-value pairs from the form, placing them into scalar variables using the Mason <%args> section. If one or more elements are missing, register.html gives the user an error message indicating that the information needs to be updated.
If the user's registration information appears to be complete, register.html performs a quick SELECT to ensure that the user name will indeed be unique. True, we have defined the table such that a user name must be unique, but we would rather produce a nice-looking error message for our users than display an error message from the database.
Note that this code creates a race condition; it is possible that two users could try to register with the same user name simultaneously. Both would be told that the user name is available, and yet only one would be allowed to insert the requested user name. Databases that support transactions, such as PostgreSQL, can avoid this problem by wrapping the SELECT and the following INSERT into a single transaction, which can then be rolled back if there is an error.
register-form.html attempts to be somewhat helpful, reminding users if they are already logged in. (After all, there usually isn't any reason to register if you're already logged in.) It uses the component get-user-info.comp (see Listing 5), which takes one argument (a user ID) and returns a hash reference describing the user with that ID. Since user IDs are stored in %session with the user_id key, we can retrieve a hash reference with user information as follows:
my $user_info = $m->comp("get-user-info.comp", user_id => $session{user_id});
If $session{user_id} is undefined—that is, if the user has no session—then get-user-info.comp returns undef. Otherwise, a program can retrieve information for the user with the hash reference's keys. Indeed, the top of register-form.html demonstrates this:
% if ($user_info) { <P>You are currently logged in as <b><% $user_info->{username} %></b>. Do you really want to register?</P> % } else { <P>You are not logged in. Go ahead and register!</P> % }
register.html automatically logs in a user. By this, we mean that it sets the value of $session{user_id} to a valid primary key for the Users table. When $session{user_id} is set, a user is said to be logged in; when it is undefined, the user is not.
Logging out a user, then, is as simple as undefining the value $session{user_id}. We do exactly this in Listing 6, logout.html. Once a user visits this page, he or she is no longer logged in. Note that the line
undef $session{user_id};
does not remove the user_id key from %session. Rather, it assigns the undefined value to $session{user_id}.
If a user fails to log out, then the session will remain active for as long as the session cookie exists. Cookies are normally assigned an expiration date when they are created, indicating the maximum date on which they should be transmitted to a server. If no expiration date is mentioned, the cookie should disappear when the user exits from the browser. Session cookies are normally set with the latter expiration date, forcing them to disappear when the user quits from the browser.
However, this doesn't mean that users can ignore the “logout” button. On the contrary, someone who fails to log out is effectively saying that any HTTP requests originating from a particular computer should be attributed to his or her user name. In a typical office, where everyone has their own computer, this might not be a serious issue. However, a student in a recent class I taught told me that she was able to read someone else's e-mail at an Internet cafe, because Yahoo! Mail had failed to log out the previous user.
If the information is particularly sensitive, you might want to force users to re-register every 15 or 30 minutes. Simply set the cookie expiration date and time to be something in the very near future, and the cookies will expire automatically.
Logging in is slightly more complicated, in that we must ask the user for a user name and password. These pieces of information, supplied from login-form.html (listing 7), are passed to the login.html component (listing 8). login.html performs two tasks: it submits a SELECT query to the database, requesting the user_id column for the submitted user name and password. If no such row exists, $sth->fetchrow_array returns undef, and we thus know that the user does not exist. If it does exist, then we retrieve all the relevant information about this user into a hash reference and set $session{user_id} to the newly rediscovered user ID. This restores the session information to the user's browser, which sets it in a cookie (or path_info, as appropriate).
While there is no room to discuss it here, it would obviously not be very difficult to create a “password-remind.html” component which allows users to retrieve their password using the hint they entered in the initial registration form.
Of course, personalized sites are rather uninteresting if they store only the user's name and e-mail address. Things get much more interesting if the site keeps track of users' interests, birthdays and stock portfolios. But once we have a unique ID that represents this user—the user_id column in Users—we can create as many tables as we like, identifying each user with their primary key.
Session management can be a tricky subject when working with the Web, since it means using a stateless connection for something it was never intended to do. With the help of Mason and Apache::Session, it is not difficult to develop a personalized site which keeps track of users' interests and customizes the site's output accordingly.