At the Forge

Rails and PostgreSQL

Reuven M. Lerner

Issue #235, November 2013

The latest version of Ruby on Rails now offers closer integration with PostgreSQL.

Regular readers of this column won't be surprised to hear that I love both Ruby on Rails and PostgreSQL. Rails has been my primary server-side Web development framework for about eight years, and it has managed to provide solutions for a large number of consulting and personal projects. As for PostgreSQL, I've been using it for about 15 years, and I continue to be amazed by the functionality it has gained in that time. PostgreSQL is no longer just a relational database. It's also a platform supporting the storage and retrieval of many types of data, built on a rock-solid, ACID-compliant, transactional core.

When I started to develop using Ruby on Rails, most of the other developers (including the core Rails developers at 37Signals) were using MySQL. As a result, Rails didn't offer any support for PostgreSQL-specific features. Indeed, one of my favorite Rails features always has been database migrations, which allow developers to change a database schema incrementally. The downside of such platform independence is that special features often are ignored, and indeed, in order to serve the lowest common denominator, many of PostgreSQL's features were ignored or relegated to third-party gems.

During the past few years, PostgreSQL has grown in popularity, both overall and within the Rails community. This is partly due to the large (and constantly growing) feature set that PostgreSQL provides. However, I'm guessing that it also has to do with the fact that Oracle now owns MySQL, along with the growth of the popular Heroku hosting service. Whether Heroku is an appropriate choice for your application is a decision that should be made on a case-by-case basis. However, the fact that Heroku offers a free tier for tiny data sets, and that it uses PostgreSQL by default, has made it a popular option for people learning Rails, for small applications and for many people who want to outsource their hosting.

As as result of PostgreSQL's growing popularity, the latest (4.x) version of Ruby on Rails includes extensive, built-in support for many PostgreSQL features. In this article, I introduce a number of these features, both from the perspective of a Rails developer and from that of a PostgreSQL administrator and DBA. Even if you aren't a Rails or PostgreSQL user, I hope these examples will give you a chance to think about how much you can and should expect from your database, as opposed to handling it from within your application.

UUIDs as Primary Keys

One of the first things database developers learn is the need for a primary key, a field that is guaranteed to be unique and indexed, and that can be used to identify a complete record. That's why many countries have ID numbers; using that number, government agencies, banks and health-care systems quickly can pull up your information. The usual standard for primary keys is an integer, which can be defined in PostgreSQL using the SERIAL pseudo-type:

CREATE TABLE People (
    id     SERIAL PRIMARY KEY,
    name   TEXT,
    email  TEXT
);

When you use the SERIAL type in PostgreSQL, that actually creates a “sequence” object, on which you can invoke the “nextval” function. That function is guaranteed to give you the next number in the sequence. Although you can define it to have a step of more than one, or to wrap around when it's done, the most usual case is to use a sequence to increment an ID counter. When you ask PostgreSQL to show you how this table is defined, you can see how the “id” field's definition has been expanded:

\d people
                          Table "public.people"

+--------+---------+--------------------------------------------+
| Column |  Type   |                 Modifiers                  |
+--------+---------+--------------------------------------------+
| id     | integer | not null default 
                     ↪nextval('people_id_seq'::regclass) |
| name   | text    |       |
| email  | text    |       |

+--------+---------+--------------------------------------------+
Indexes:
    "people_pkey" PRIMARY KEY, btree (id)

So, you can see that there's nothing special about the “id” column, except that it has a default value. If you don't specify the value of “id” in your INSERT statement, PostgreSQL will invoke nextval on a sequence. In this way, you can be sure that the “id” column always will have a unique value.

But, what if you don't want to use integers? I've always been partial to them, but it is common and popular to use UUIDs (universally unique IDs). One of the advantages of UUIDs is that they are (more or less) guaranteed to be unique across computers, allowing you to merge records from multiple servers. If you were to do this with an integer primary key, you might well have multiple records with an ID of 5 or 10. But with UUIDs, this is far less likely.

In theory, PostgreSQL always has supported the use of UUIDs as primary keys. After all, you can just use a text field and have your application generate and insert the UUIDs. But that puts the onus on the application, which isn't really appropriate. A better solution is to use PostgreSQL's uuid-ossp extension, which has shipped with the last few versions of the database. In a modern version of PostgreSQL, you can issue the SQL command:

CREATE EXTENSION "uuid-ossp";

Note that you must use double quotes here, because there is a - character in the identifier. Double quotes tell PostgreSQL to keep an identifier precisely as you have written it (don't confuse this with single quotes, which are used for text strings).

Also note that extensions are installed only in the database where you issued the CREATE EXTENSION command. Thus, if you add an extension to the “foo_development” database, it won't be in the “foo_production” database automatically. To ensure that an extension is present in all databases, add it to “template1”, from which all new databases are copied.

Once you have installed the extension successfully (which the database will confirm by echoing your command, CREATE EXTENSION), you can use it. Like many PostgreSQL extensions, uuid-ossp defines a new data type and functions that know how to use it. For example, you now can invoke the uuid_generate_v1() function, getting back data of type “uuid”:

select uuid_generate_v1();
+--------------------------------------+
|           uuid_generate_v1           |
+--------------------------------------+
| 6167603c-276b-11e3-b71f-28cfe91f81e7 |
+--------------------------------------+
(1 row)

If you want to use a UUID as your primary key, you can redefine the table as follows:

CREATE TABLE People (
    id UUID NOT NULL PRIMARY KEY DEFAULT uuid_generate_v1(),
    name TEXT,
    email TEXT
);

As you can see, here you have replaced the SERIAL type with a UUID type (defined by the extension) and have instructed PostgreSQL to invoke the UUID-generating function when no UUID value is provided. If you insert a row into this table, you'll see that the UUID is indeed generated:

INSERT INTO People (name, email)
VALUES ('Reuven', 'reuven@lerner.co.il');

SELECT * FROM People;
+--------------------------------------+--------+---------------------+
|                  id                  |  name  |        email        |
+--------------------------------------+--------+---------------------+
| 9fc82492-276b-11e3-a814-28cfe91f81e7 | Reuven | reuven@lerner.co.il |
+--------------------------------------+--------+---------------------+

Now, all if this is great if you're working directly at the database level. But Rails migrations are supposed to provide a layer of abstraction, allowing you to specify your database changes via Ruby method calls. Starting in Rails 4, that's possible. I can create a new Rails application with:

rails new pgfun -d postgresql

This will create a new “pgfun” Rails app, using PostgreSQL as the back-end database. I then create an appropriate database user at the command line (giving that user superuser privileges on the database in question):

createuser -U postgres -s pgfun

I then create the development database:

createdb -U pgfun pgfun_development

Now you're now ready to create your first migration. I use the built-in Rails scaffold mechanism here, which will create a migration (as well as controllers, models and views) for me:

rails g scaffold person name:text email:text

Notice that I haven't specified a primary key column. That's because Rails normally assumes there will be a numeric column named “id”, which will contain the primary key. However, you're going to change that by opening up the migration file that was created in db/migrations. By default, the migration looks like this:


class CreatePeople < ActiveRecord::Migration
  def change
    create_table :people do |t|
      t.text :name
      t.text :email

      t.timestamps
    end
  end
end

By passing an additional parameter to create_table (before the block, in the first line), you can indicate that you want the primary key to be a UUID:


class CreatePeople < ActiveRecord::Migration
  def change
    create_table :people, id: :uuid do |t|
      t.text :name
      t.text :email

      t.timestamps
    end
  end
end

With that in place, your primary key still will be called “id”, but it will be of type UUID. You can run the migration to be sure with:

bundle exec rake db:migrate

Sure enough, the table has been defined as you might like:

\d people
                              Table "public.people"

+------------+-----------------------------+--------------------------+
|   Column   |            Type             |         Modifiers        |
+------------+-----------------------------+--------------------------+
| id         | uuid                        | not null default 
                                             ↪uuid_generate_v4() |
| name       | text                        |               |
| email      | text                        |               |
| created_at | timestamp without time zone |               |
| updated_at | timestamp without time zone |               |

+------------+-----------------------------+--------------------------+
Indexes:
    "people_pkey" PRIMARY KEY, btree (id)

If you look carefully, however, you'll see there's a difference between the default that the Rails migration generated and the one generated by hand earlier. The difference is in the function that is being used to generate a UUID—in the manual version, you generated a “version 1” UUID, based on the MAC address of the computer that created it. Rails, by contrast, uses the “version 4” UUID algorithm, which is completely random. The advantage of the v4 UUID is that the number is more random, thus reducing the chance that someone can guess it. However, because the data is random, it'll be slower for PostgreSQL to index it. If you want to tell Rails to use the v1 function, add a line to the migration:


class CreatePeople < ActiveRecord::Migration
  def change
    create_table :people, id: false do |t|
      t.primary_key :id, :uuid, default: 'uuid_generate_v1()'
      t.text :name
      t.text :email

      t.timestamps
    end
  end
end

Note that if you want to run the modified migration, it's probably easiest and best just to drop and re-create the “people” and “schema_migrations” tables. Rails remembers which migrations already have been applied, and it won't re-run one, even if you have modified the file:

\d people
                              Table "public.people"

+------------+-----------------------------+--------------------------+
|   Column   |            Type             |              Modifiers   |
+------------+-----------------------------+--------------------------+
| id         | uuid                        | not null default
                                              ↪uuid_generate_v1() |
| name       | text                        |               |
| email      | text                        |               |
| created_at | timestamp without time zone |               |
| updated_at | timestamp without time zone |               |
+------------+-----------------------------+--------------------------+
Indexes:
    "people_pkey" PRIMARY KEY, btree (id)

With this default in place, your “people” table now will use UUIDs.

Arrays

Arrays are another PostgreSQL feature that is now supported natively by Rails. PostgreSQL has supported arrays for a number of years, and although I personally find the syntax to be a bit difficult to deal with, there's no doubt that arrays can simplify some database designs. (I should note, however, that arrays should be a last resort, because they tend to result in a non-normalized database design, potentially leading to unnecessary duplication of data.) For example, if I want to create a “posts” table for my blog and then allow people to store one or more social tags, I could define it as:

create table posts (
    id       UUID NOT NULL PRIMARY KEY DEFAULT uuid_generate_v1(),
    headline TEXT,
    body     TEXT,
    tags     TEXT[]
);

Notice the data type associated with the “tags” column. By using square brackets after the TEXT type, I've indicated that the column can contain zero or more text strings. For example:

INSERT INTO Posts (headline, body, tags)
VALUES ('my headline', 'my body', '{general, testing}');

Notice that the array value is inserted as a string, the first and final characters of which are curly braces. Now you can get information from the array as follows using square brackets, remembering that unlike many languages, PostgreSQL indexes arrays starting with 1:

SELECT headline, body, tags[1], tags[2] FROM Posts;
+-------------+---------+---------+---------+
|  headline   |  body   |  tags   |  tags   |
+-------------+---------+---------+---------+
| my headline | my body | general | testing |
+-------------+---------+---------+---------+
(1 row)

Notice how you can retrieve each of the tag elements separately by using their index. If you try to use an index for which there is no value, you get a NULL instead. You also can use the ANY operator to find rows in which a particular tag value is assigned:

select headline, body, tags from posts where 'general' = ANY(tags);
+-------------+---------+-------------------+
|  headline   |  body   |       tags        |
+-------------+---------+-------------------+
| my headline | my body | {general,testing} |
+-------------+---------+-------------------+

Note that the ANY operator must be on the right-hand side of the comparison. Otherwise, you'll get a syntax error from PostgreSQL.

There was little or no support for PostgreSQL arrays in earlier versions of Ruby on Rails. But starting with Rails 4, there is support for such functionality. You not only can define a column to contain an array, but you also can use ActiveRecord to manipulate it. First and foremost, let's create a scaffold for the resource:

rails g scaffold post headline:text body:text tags:string

This generates the necessary files. Don't run the migration just yet, however; you first need to turn “tags” from a string into an array of strings and your ID into a UUID:


class CreatePosts < ActiveRecord::Migration
  def change
    create_table :posts, id: false do |t|
      t.primary_key :id, :uuid, default: 'uuid_generate_v1()'
      t.text :headline
      t.text :body
      t.string :tags, array:true, default:[]

      t.timestamps
    end
  end
end

Now you will have a UUID for a primary key, but you also will define tags to be an array. After running the migration, this is what you'll see:

\d posts
                               Table "public.posts"

+------------+-----------------------------+--------------------------+
|   Column   |            Type             |          Modifiers       |
+------------+-----------------------------+--------------------------+
| id         | uuid                        | not null default
                                             ↪uuid_generate_v1() |
| headline   | text                        |               |
| body       | text                        |               |
| tags       | character varying(255)[]    | default '{}'::character
                                             ↪varying[]   |
| created_at | timestamp without time zone |               |
| updated_at | timestamp without time zone |               |
+------------+-----------------------------+--------------------------+
Indexes:
    "posts_pkey" PRIMARY KEY, btree (id)

From a database perspective, things seem great; you can perform an INSERT:

INSERT INTO Posts (headline, body, tags, created_at, updated_at)
VALUES ('my headline', 'my body', '{general, testing}', now(), now());

Sure enough, you can see the post in the database. The magic, however, is that ActiveRecord allows you to treat the PostgreSQL array as if it were a Ruby array. For example, you can say:

Post.first.tags.each {|t| puts t}

This tells Rails to ask ActiveRecord for the first record in the Posts table and to call up its “tags” column, which is returned as a Ruby array of strings. You then can iterate over those strings, printing them (or otherwise manipulating them). Although this isn't very efficient or smart, you also can do the following:

Post.all.select {|p| p.tags.member?('foo')}

A more efficient way to do this would be to use the ANY operator that you saw earlier, passed to PostgreSQL in a string:

Post.where("'general' = ANY(tags)").first

Unfortunately, it doesn't seem that you can add elements to a PostgreSQL array using the standard Ruby << (append) operator. Rather, if you want to add one or more elements to an array via ActiveRecord, you must do so manually:

p.update_attributes(tags: ['general', 'testing', 'zzz'])

This is somewhat annoying, but not fatal, particularly not for the first version where this functionality is included.

Summary

Rails 4, although not breaking compatibility with its predecessors nearly as much as Rails 3, does introduce a great deal of new functionality. For me, one of the most interesting areas of this functionality is a shift toward PostgreSQL, with ActiveRecord migrations and functionality abandoning some of its platform independence. In this article, I showed two of the features that are now available, namely UUIDs and arrays. However, there are additional features, such as native support for INET (that is, IP address) data types, for JSON (which PostgreSQL 9.3 supports even better than it did in the past), ranges and even for hstore, a NoSQL-like storage system built on top of PostgreSQL.

No technology, including PostgreSQL, is the right answer for everyone at all times. However, in my experience, PostgreSQL offers excellent performance, features and stability, with a fantastic community that answers questions and strives for correctness. The fact that Rails 4 has embraced many of these features is likely to expose even more people to PostgreSQL, which can only be good for Web and database developers who use open-source products.

Web developer, trainer and consultant Reuven M. Lerner is finishing his PhD in Learning Sciences at Northwestern University. He lives in Modi'in, Israel, with his wife and three children. You can read more about him at lerner.co.il, or contact him at reuven@lerner.co.il.