At the Forge

Rails and Databases

Reuven M. Lerner

Issue #139, November 2005

After years of painful Web development, here's a development framework based on understanding how Web developers really use relational databases. Rails standardizes the tweaky parts for you to save time.

Last month, we began looking at Ruby on Rails, a Web development framework that has captured a great deal of attention in only a short time. Much of the success of Rails is due to the ease with which Web/database developers can accomplish various tasks. Indeed, Rails fans often tout the fact that their applications have almost no configuration files, allowing programmers to concentrate on development, rather than logistics.

This month, we begin to look at how Rails works with relational databases. Even if you won't be using Rails in your own Web development work, the way Rails addresses many different issues is extremely elegant and may well influence future generations of object-relational technologies.

The Problem

The database side to Rails attempts to solve a seemingly simple problem. Where and how should a Web application store persistent information? Nearly any Web application we might want to build, from a shopping cart to a calendar/diary, needs to store its information somewhere. And because Web applications run on the server, rather than on the user's desktop, we need to keep track of data for many different users, rather than just one.

Back in the olden days of Web development, when applications were far less sophisticated, some of us used basic text files. But we quickly discovered that a relational database was an improvement on nearly every level. Relational databases are designed to provide fast, secure and flexible access to the data that we want—so long as we can represent our data as two-dimensional tables.

But as simple as that last sentence makes it sound, moving data from a program into a database is neither simple nor straightforward. Sure, the simple stuff is indeed pretty simple; it's not a big deal to keep track of customers' bank balances, or even the latest transactions in their checkbooks. But there are big differences between the objects that are increasingly at the center of the programming world and the tables that are at the center of the database world. Consider the contortions that database programmers go through in representing arbitrarily deep hierarchies, and you'll begin to understand how the mapping between objects and tables can be quite complex.

There are basically three ways to bridge this gap between objects and tables: handle it manually, replace the tables with objects and use an automatic mapping tool. The manual approach, which is probably the most common and popular, simply means that the programmers stick SQL queries into the code. To get the contents of a shopping cart, we do something like this Perl code:


# Send the shopping-cart query
my $sql = "SELECT item_id, item_name,
                  item_price, item_quantity
             FROM ShoppingCart
            WHERE user_id = ?";
my $sth = $dbh->prepare($sql);
$sth->execute($user_id);

my $total_cost;

print "<table>
              <th>Name</th>
              <th>Price</th>
              <th>Quantity</th>\n";

# Iterate over the elements of the shopping cart
while (my $rowref = $sth->fetchrow_arrayref())
{
    my ($item_id, $item_name, $item_price,
            $item_quantity) = @$rowref;

    $total_cost += $item_price * $item_quantity;

    print "<tr><td>$item_name</td>
               <td>$item_price</td>
               <td>$item_quantity</td></tr>\n";
}

print "<tr><td>Total cost:</td>
           <td>$total_cost</td></tr>
       </table>\n";

The first few times you write such code, it doesn't seem so bad. But after a while, it begins to grate on you. Why are you writing so much SQL, when all you want is the elements of your shopping cart? Even if you wrap the SQL inside of an object, you'll find yourself creating many such objects over the course of a project.

The people who wrote Zope, a Python-based Web application framework, decided that although relational databases have their place, the real solution to this problem is to avoid the object-table translation as much as possible, opting instead for an object database. ZODB (Zope Object Database) thus allows you to store and retrieve Python objects as part of a hierarchy. If you can represent data in a Python object, ZODB makes it easy to keep that data persistently.

But of course, ZODB has its problems as well. To begin with, you can use it only from Python; by contrast, relational databases typically can be accessed from any number of languages. And although ZODB now has multiversion concurrency control (MVCC), transactions and a host of other features, the fact that it simply stores a set of objects means that you can't easily sort, search or perform “joins”, which are the cornerstone of the relational world.

Object-Relational Mappers

The third alternative, namely that of having an object-relational mapper, has become increasingly popular. The basic idea is pretty simple. Your program uses objects, and those objects are automatically transformed into rows, columns and tables in a relational database.

For many years, object-relational mappers have had all sorts of difficulties, particularly when working with sophisticated data sets. But they are now increasingly robust and impressive; and though I have not worked with either of them, Hibernate (for Java programmers) and SQLObject (for Python programmers) offer just these sorts of services, and Alzabo (described in this column several years ago) provides such services for Perl programmers. When implemented correctly, object-relational mappers provide the best of both worlds, including all of the speed, cross-language and maintenance benefits of a relational database along with the flexibility and consistency of working with objects from within the code.

When Rails burst onto the Web development scene about a year ago, its proponents touted the fact that Rails allows you to produce a Web/database application with almost no configuration and with very little code. And indeed, this is the case, thanks to several different features. One of the key features that makes this possible, however, is a sophisticated object-relational mapper known as ActiveRecord.

ActiveRecord is a Ruby class that is traditionally used as the parent of model classes within a Rails application. As you may recall, Rails uses the traditional model-view-controller (MVC) paradigm to build Web applications. Unlike some MVC application frameworks, Rails makes the differences between these explicit, creating models, views and controllers subdirectories within the application's app directory. A model class in Rails doesn't have to inherit from ActiveRecord, in which case it functions like any other data structure or class. But if it does inherit from ActiveRecord (or more precisely, from ActiveRecord::Base), the object knows how to store and retrieve its values from a table in a relational database.

At this point, you might be asking, “Wait a second—how is it possible that inheritance alone can provide an object-relational mapping? Don't I need to configure something?” The short answer, amazing as it might seem, is “no”. There is, of course, a slight trade-off, one that might bruise your ego if you aren't careful. Rails is able to accomplish this magic by forcing all programs to adhere to a particular set of conventions. Indeed, one of the Rails mantras is “convention over configuration.” If you are willing to name your tables, columns and objects according to the accepted convention, Rails will reward you handsomely. If you insist on using your own conventions, or if you want to connect Rails to an existing set of tables, you might find yourself struggling to implement even the simplest application.

Connecting

So, how do we connect Rails to our database? Much of the documentation I have seen uses the popular open-source MySQL database for its examples; I strongly prefer PostgreSQL, and thus use it in my examples instead. However, you will soon see that the choice of a back-end database is almost invisible when it comes to Rails.

If you haven't done so already, install the Ruby Gems package, and then use the gem command to install Rails, all of its dependent classes and postgres-pr:

$ gem install --remote rails
$ gem install --remote postgres-pr

Now we use the rails command to create a new Rails application. If you still don't have the Weblog application we began last month, you can create it by typing:

$ rails blog

In many Web/database frameworks, the individual page or program must connect to the database each time. In Rails, the underlying system connects to the database for us, automatically tying the database connection to the ActiveRecord object class. The configuration is kept under the application directory in config/database.yml. No, that's not a typo; the extension is yml (YAML, or Yet Another Markup Language, or YAML Ain't a Markup Language), a simplified text format that is easier to read, write and parse than XML.

Traditionally, every Rails application uses three different databases, one each for development, testing and production. These three databases are created with a prefix that reflects the application name and a suffix that reflects its use (either development, test or production). For example, this is the database.yml file for the blog application:

development:
  adapter: postgresql
  database: blog_development
  host: localhost
  username: blog
  password:

test:
  adapter: postgresql
  database: blog_test
  host: localhost
  username: blog
  password:

production:
  adapter: postgresql
  database: blog_production
  host: localhost
  username: blog
  password:

Notice how the database adapter name is postgresql, even though I used the postgres-pr gem to connect to it. Also notice that the database is accessed by a user named blog. For this to work correctly, I now have to create the blog user in PostgreSQL (not as a Linux user):

$ /usr/local/pgsql/bin/createuser -U postgres blog
Shall the new user be allowed to create databases? (y/n) y
Shall the new user be allowed to create more new users? (y/n) n
CREATE USER

Now that we have created the blog user, we use it to create the three databases:

$ /usr/local/pgsql/bin/createdb -U blog blog_development
CREATE DATABASE
$ /usr/local/pgsql/bin/createdb -U blog blog_test
CREATE DATABASE
$ /usr/local/pgsql/bin/createdb -U blog blog_production
CREATE DATABASE

Finally, we should create a table in our database. We use only the development database for now, but we adhere to the convention of writing our table definitions in the blog/db directory, in a file named create.sql:

CREATE TABLE Blogs (
id           SERIAL   NOT NULL,
title        TEXT     NOT NULL,
contents     TEXT     NOT NULL,

 PRIMARY KEY(id)
);

I have already mentioned the importance of following Rails conventions when working with the ActiveRecord object-relational mapper, and the above table definition, as simple as it seems, already uncovers two of them. To begin with, every row has a unique ID field named id. (PostgreSQL, following SQL standards, has case-insensitive table and column names by default.) In PostgreSQL, we ensure that every row has a unique value of id by declaring it to be a SERIAL type. If you're like me, and have always used more explicit names (such as, blog_id) for the primary key, you'll need to change in order to work with Rails.

Another convention, and one that is a bit more subtle to notice, is that our table name is Blogs, a plural word. A class descended from ActiveRecord::Base is automatically mapped to a database table with the same name, but pluralized. So if we create a blog class that inherits from ActiveRecord::Base in models/blog.rb, it is automatically mapped to the blogs table in our database. As you can see, your choice of a name can affect the readability of your code; be sure to choose a name that makes sense in a number of different contexts, both singular and plural. (In this case, my choice of words was admittedly unfortunate, because each row of the Blogs table represents one posting, rather than one Weblog.)

But it gets better—we don't need to create blog.rb ourselves, at least not at first. We can ask Rails to create it for us, using script/generate. script/generate can be used to create a model, controller or view; in this case, we create our model:

ruby script/generate model blog

You will see some output that looks like this:

exists  app/models/
exists  test/unit/
exists  test/fixtures/
create  app/models/blog.rb
create  test/unit/blog_test.rb
create  test/fixtures/blogs.yml

If we open up app/models/blog.rb, we see that it's nearly empty:


class Blog < ActiveRecord::Base
end

Although we can (and will) add new methods to our Blog class, we can actually leave it as it stands. That's because ActiveRecord provides our class with enough skeleton methods that we can get by without them.

Although it's nice that we now have a Ruby class that is automatically mapped to our Blogs table in the database, we still have to access our table via the Web. This means we need to create a controller class, because controllers (the C in MVC) are the components in Rails that handle incoming HTTP requests. We can generate a controller automatically:

ruby script/generate controller blogadmin

Unfortunately, this controller isn't tied to our class at all. And although we could make such a connection ourselves, the fact that we're at the very beginning of our application definition means we can take a bit of a shortcut, asking Rails to generate an entire set of scaffolding, or bare-bones classes, that will do much of what we want. Creating such scaffolding is a great way to get jump-started with Rails development or even for working on a new project. At the same time, generating the scaffolding means blowing away class definitions you already have written. Because we have (so far) used only the default classes, this shouldn't be much of a problem.

We generate the scaffolded application with:

ruby script/generate scaffolding Blog Admin

(You should answer “Y” or “a” to replace one or all of the existing files, as appropriate.)

This creates a controller class named Admin that gives us basic access to a Blog class. The latter then connects to the Blogs table in the database.

With only the scaffolding in place, we can now start the server:

ruby script/server

Then, we point our browser to the application, at the /admin URL: http://localhost:3000/admin.

Sure enough, we see—nothing at all, aside from a few links that let us add a new entry into our Blogs table. If you click on add, you now will see a form that lets you create a new Weblog entry. These automatically generated pages are in the app/views subdirectory. In particular, look at new.rhtml and list.rhtml in app/views/admin. You can, of course, change these views—and in a production application, you will. But for getting your feet wet with Rails, or just trying out an application idea, this is indeed pretty useful.

Now, when you go to the add page, you might be surprised to discover that there is one field for each of the columns in the Blogs table, except for id. This is the result of some cleverness on the part of the automatically generated scaffolding code; it looked at the table definitions and decided what kind of input area to show. What happens if we add another column to our Blogs table that represents when the blog entry was added? (After all, a Weblog whose contents aren't sorted in date order isn't going to be very useful.)

To save time, we simply go in and modify our table definition, using the ALTER TABLE command:

$ psql -U blog blog
% ALTER TABLE Blogs ADD COLUMN posted_at
        TIMESTAMP NOT NULL DEFAULT NOW();

If you look at the table definition (with the \d command in the psql client program), you'll see that it now has a new column named posted_at. The naming conventions in Rails extend to the names of columns; columns of type DATE should be named xxx_on, and columns of type TIMESTAMP (that is, both date and time) should be named xxx_at.

We now need to regenerate our scaffolding code, blowing away any previous version that might have existed (which is okay in this particular case):

ruby script/generate scaffolding Blog Admin

Next, restart the server and go back to the new blog page. You will see that it has changed, so that it now includes a posted at field. Moreover, you can't enter arbitrary text there; a full-blown date-entry set of selection lists is in place. If you ever have written code to handle the entry of dates in a Web application, this alone should be a pleasant change.

Finally, take some time to explore both the application (using your Web browser) and the updates that take place in the database as you add, modify and delete rows. Without having written even a single line of Ruby code, you should find yourself able to use the Web-based forms to modify the database. If you want to be a bit adventurous, you can even modify list.rhtml, which shows you the current list of blog entries.

Conclusion

Many Web/database frameworks have struggled to offer a persistent storage layer that interfaces cleanly with the programming language itself. Embedded SQL code isn't too terrible on a small scale, but even a medium-size application can result in a great deal of SQL queries in the middle of an otherwise object-oriented application. The Rails solution strikes a balance that I find quite pleasing, forcing very small, logical changes on me in exchange for a great deal of time savings.

Of course, it's not very hard to create an object-relational mapper when all you need to worry about is column types and individual tables. Moreover, you'll quickly discover that as written, our simple blog application has several problems. To begin with, it has an administrative interface, but no method for displaying the blog to the world! Also, it doesn't display blog entries in any sort of chronological order. Next month, we will see how to solve these problems, as well as how Rails enforces data integrity with a few simple lines in our model definitions.

Resources for this article: /article/8526.

Reuven M. Lerner, a longtime Web/database consultant and developer, now is a graduate student in the Learning Sciences program at Northwestern University. His Weblog is at altneuland.lerner.co.il, and you can reach him at reuven@lerner.co.il.