At the Forge

Working with ActiveRecord

Reuven M. Lerner

Issue #140, December 2005

Reuven Lerner continues his series on Ruby on Rails with this look at database integrity checking using ActiveRecord.

For the past few months, we have been looking at Ruby on Rails, the hot new open-source toolkit for creating Web/database applications. One of the core elements of this toolkit, as we saw last issue, is the ActiveRecord class, which automatically translates between Ruby objects and data in a relational database. Object-relational mappers, as such software is often known, bridges the gap between the object-oriented and relational worlds, which treat data in fundamentally different ways.

This month, we look at some of the ways we can modify ActiveRecord to validate our data in various ways. We also see how we can work with classes that depend on one another, doing something a bit more sophisticated than the basic scaffolding provides with only a few simple lines of code.

Primary Keys

When I first started to work with relational databases, I would create tables that looked like this:

CREATE TABLE People (
    first_name    TEXT  NOT NULL,
    last_name     TEXT  NOT NULL,
    phone_number  TEXT  NOT NULL,
    email_address TEXT  NOT NULL
);

And of course, the above definition of People will work just fine, providing the basis for a computerized address book. However, the above definition has several problems. To begin with, what happens if there is more than one person with the same name? That is, if we have two people named George Washington in our database, we're going to have a serious problem. How will we know which is the George we want?

The solution to this problem is to assign a unique number to each record in the database. Each relational database product has a different way of accomplishing this. In PostgreSQL, we add a new column and assign it a SERIAL type, indicating that it should be a nonrepeating integer:

CREATE TABLE People (
    id            SERIAL   NOT NULL,
    first_name    TEXT   NOT NULL,
    last_name     TEXT   NOT NULL,
    phone_number  TEXT   NOT NULL,
    email_address TEXT   NOT NULL
);

We then tell PostgreSQL that it should consider id to be not just another column, but the primary key, an identifier that is guaranteed to be unique and that can serve as identification for one row in the table:

CREATE TABLE People (
    id            SERIAL   NOT NULL,
    first_name    TEXT   NOT NULL,
    last_name     TEXT   NOT NULL,
    phone_number  TEXT   NOT NULL,
    email_address TEXT   NOT NULL,

    PRIMARY KEY(id)
);

Although we can now find people in our address book with their first or last names, we also can do so using their unique ID. Even if there are 100,000 people named George Washington in our database, we can unambiguously find the one that interests us using the id column. Think of the times you have been asked to identify yourself using a driver's license number, a national ID number or a Social Security number, and you quickly will realize that each of these can be used as a primary key in a database.

One additional result of this constraint is that the database creates an index for the id column. Even if you have a very large table of addresses, the fact that id is indexed means that the database can use it to find records quickly. In addition, although SERIAL columns can be set manually in an INSERT statement, just like INTEGER columns, they're normally not set explicitly at all. Rather, PostgreSQL assigns the next consecutive integer to be the column value—perfect for a primary key, whose value must be unique.

Foreign Keys

Primary keys are useful in this way, but we have not yet begun to understand their power. That's because primary keys really come into their own when they make it possible for us to link tables together. For example, consider a computerized appointment calendar that we might want to build as an add-on module to our existing address book. We could create a table like the following:

CREATE TABLE Appointments (
    id        SERIAL    NOT NULL,
    person_id INTEGER   NOT NULL,
    start_at  TIMESTAMP NOT NULL,
    end_at    TIMESTAMP NOT NULL,
    comment   TEXT,

    PRIMARY KEY(id)
);

The above table has an id column, uniquely identifying every appointment. It also has two columns identifying the time at which the appointment starts and ends, as well as room for an optional comment or description.

But there is also a person_id column, which allows us to indicate with whom we will be meeting. This database design has a number of problems, but perhaps the most striking one is that there is no constraint (other than NOT NULL) on the value that we can assign to person_id. Even if our People table is empty, we can assign person_id to be 10, 100 or 996—these numbers might be acceptable technically, but they don't help us ensure that person_id refers to an actual person.

The solution is to define person_id as a foreign key, indicating that values of person_id are legitimate only if they reflect an existing value in the People table. In PostgreSQL, we accomplish this as follows:

CREATE TABLE Appointments (
    id        SERIAL    NOT NULL,
    person_id INTEGER   NOT NULL  REFERENCES People,
    start_at  TIMESTAMP NOT NULL,
    end_at    TIMESTAMP NOT NULL,
    comment   TEXT,

    PRIMARY KEY(id)
);

With these conditions in place, we can be sure that we will be able to make an appointment only with someone in our address book. What happens if we try to get around it? Let's see:


INSERT INTO People (first_name, last_name,
                         phone_number, email_address)
    VALUES ('George', 'Washington',
            '202-555-1212', 'first.prez@whitehouse.gov');

When we SELECT the elements of our database table, we can see the value that was automatically assigned to our id column:

 id | first_name | last_name  | phone_number |       email_address
----+------------+------------+--------------+---------------------------
  1 | George     | Washington | 202-555-1212 |  first.prez@whitehouse.gov

Now let's insert an appointment with George:

INSERT INTO Appointments (person_id, start_at, end_at, comment)
VALUES (1, '2005-Oct-2 18:00', '2005-Oct-2 20:00', 'Dinner');

So far, so good. But, what happens if we try to insert an appointment with a nonexistent person?

INSERT INTO Appointments (person_id, start_at, end_at, comment)
VALUES (200, '2005-Nov-2 18:00', '2005-Nov-2 20:00',
            'Dinner with no one');

PostgreSQL rejects our INSERT statement, saying that inserting the row would violate the constraint introduced with the REFERENCES command:

ERROR:  insert or update on table "appointments" violates foreign key
constraint "appointments_person_id_fkey"
DETAIL:  Key (person_id)=(200) is not present in table "addressbook".

What happens if we try to remove George from our People table while we have an appointment with him?

DELETE FROM People WHERE id = 1;

Once again, PostgreSQL rejects our request, indicating this time that we cannot remove an item that is being pointed to:

ERROR:  update or delete on "addressbook" violates foreign key
constraint "appointments_person_id_fkey" on "appointments"
DETAIL:  Key (id)=(1) is still referenced from table "appointments".

ActiveRecord and Foreign Keys

All of the constraints we have seen so far have been at the level of the database, rather than any application using that database. This potentially means trouble for the users of those applications who don't have access to the database definitions. After all, what is supposed to happen if the application tries to insert, delete or modify a row such that it violates a constraint?

The simple answer, and one that is still prevalent in a surprisingly large number of Web/database operations, is that the program simply reports an error. (Sometimes it even will indicate what the error was, needlessly exposing the offending SQL statement for everyone to see.) In some cases, the application indicates that there was a database problem, or something of the sort.

But, what we really would like is to avoid those sorts of database problems altogether. We would prefer to have the constraints in our database somehow be propagated to the application level, letting the application catch problems before they ever get to the database level.

Although ActiveRecord cannot do this, it comes very close, making it almost trivially simple for us to represent relationships between tables in a Rails application. Let's now create a simple Rails application that uses ActiveRecord to keep track of our address book and calendar information.

We begin by creating the skeleton Rails application by typing rails addressbook, which creates an addressbook directory and puts everything underneath that. Then, we modify config/database.yml to point to development, testing and production databases in the appropriate place. (See last month's At the Forge for an example of what database.yml should look like.)

Now, let's create basic models, controllers and views for the People and Appointment tables. We could use the script/generate program that comes with Rails to create them separately. But in many cases, it's easiest to create a bare-bones application, or scaffold:

ruby script/generate scaffold Person
ruby script/generate scaffold Appointment

We can now start the test server on port 3000 (script/server); going to /People shows the current list of people and lets us create a new person. Click on the new person link, and you will see the page the scaffolding created. However, not all is perfect here—what happens if you click on the create button at the bottom of the page without entering anything in the text fields?

Assuming the definition of the People table described earlier, Rails will create a new person whose fields are all the empty string. We could solve the problem by modifying the definition of the People table, adding checks to ensure that the contents of each field is a non-empty string—but if we were to do this, Rails would show us the database error, complaining that we had violated an integrity constraint.

The solution is to modify the Person object so that it catches such errors, forcing the user to enter something in each field. We do this by modifying the Person class definition, located in app/models/person.rb. When we first open person.rb, we see that it is an unchanged subclass of ActiveRecord::Base:


class Person < ActiveRecord::Base
end

We can add one of the built-in Rails validators, statements that allow us to check the integrity of the data at the application level, before it ever gets to the database level. In this case, we use validates_presence_of, naming each of the fields from our table:


class Person < ActiveRecord::Base
    validates_presence_of :first_name, :last_name,
                          :email_address, :phone_number
end

With this in place—and without even having to restart the server—we can try adding another blank person. But now we find that Rails has stopped us, explaining the problem (for example, “Phone number can't be blank”) at the top of the form and outlining each of the offending fields in red. With this validator in place, we can be sure that all of the rows in the People table will contain valid data.

When we go to /Appointments to add a new appointment, something seems suspicious even before we click on the create button at the bottom of the page: there isn't anywhere that we can enter the person with whom we are meeting! This will cause problems, as clicking on the create button quickly demonstrates; PostgreSQL returns an error, which Rails displays for all to see. Clearly, we need to solve this problem.

The problem is that the view for creating new instances of the Appointment class (that is, app/views/appointments/new.rhtml) is missing an HTML form element named appointment[person_id]. If new.rhtml were to include appointment[person_id], it would be submitted along with the rest of the elements of the form and inserted into the database.

The thing is, appointment[person_id] should be populated from the database. Assuming that we have a variable named @people available to us, we could add something like this to new.rhtml right before the call to submit_tag:


<b>Person:</b><br />
<select name="appointment[person_id]">
    <option value="">Select a person</option>
    <% @people.each do |person| %>
    <option value="<%= person.id %>">
    <%= person.first_name %>
    </option>
    <% end %>
</select><br />

The above RHTML code is similar to JSP and ASP in that it embeds Ruby code inside of an HTML document. Code surrounded by <% %> is executed in place, while code surrounded by <%= %> is replaced by its return value.

The above code thus defines an HTML form element named appointment[person_id]. It then creates an option with a blank value. Next, we get into a standard Ruby idiom, iterating over the elements of a list, using person as an iterator, pulling out person.id as the value and person.first_name as the text. In other words, we create a <select> list of the people in our People table.

But where does @people come from? We have to define it, but we can do that inside of the Appointments controller object, app/controllers/appointments_controller.rb. That file contains all of the methods the scaffolding system created for us. We merely have to add one line to the new method definition:

@people = Person.find_all

Now, we know that @people is a variable we're defining, and we know that Person is a subclass of ActiveRecord::Base that hooks us to the People table in our database. The find_all method returns all of the elements in the table.

Finally, we modify our data model class, appointment.rb, adding a validator to ensure that we will have nonblank values for each of the fields:


class Appointment < ActiveRecord::Base
    validates_presence_of :start_at, :end_at, :comment, :person_id
end

With all of this in place, we can begin to schedule appointments. Each appointment will be with a single person, and we can be sure that it will contain all of the data that we want. Moreover, we know that by the time PostgreSQL receives the data to be inserted, it will be valid.

Conclusion

Although constraints in our database ensure that the data will always be valid, we generally want to perform such validation at the application level. Unfortunately, doing so is tricky or time consuming in many languages. ActiveRecord, the object-relational mapper at the heart of Ruby on Rails, makes it relatively easy to ensure that your users never have to see a database error. It comes with a number of validators, as well as an infrastructure for creating custom ones. Moreover, it comes with a number of routines that let us describe the relationships among different tables. With some small modifications to the controllers, views and models, we are able to create a custom application with valid data quickly.

Resources for this article: /article/8580.

Reuven M. Lerner, a longtime Web/database consultant and developer, now is a graduate student in the Learning Sciences program at Northwestern University. His Weblog is at altneuland.lerner.co.il, and you can reach him at reuven@lerner.co.il.