At the Forge

Database Modeling with Django

Reuven M. Lerner

Issue #161, September 2007

Let Django and its object-relational model do the SQL database work for you.

The past few months, this column has been examining Django, a popular open-source Web development framework written in Python. Django sometimes is described as a rival to Ruby on Rails or a Python version of Rails, but just as Python and Ruby are distinct languages, each with its own strengths and weaknesses, Django and Rails are different frameworks, and each has its own set of trade-offs.

If you have been following this series of columns about Django, you already have seen how to download and install the Django software, how to create and configure a site and application, and even how to create views (Python methods that handle the business logic) and templates (HTML files with special rules for interpolating variables and dynamic content). With everything we've looked at so far, you could presumably create an interesting dynamic Web application.

However, most modern Web applications have another component, a relational database, on which they rely for data storage and retrieval. Sure, you could store everything on the filesystem or even in memory, but for most of us, a relational database is the path of least resistance, ensuring the safety of our data while providing a great deal of flexibility in retrieving it.

This month's column, then, looks at the ways in which Django programmers can store and retrieve information in a database. If you have worked with databases only from PHP or CGI programs, you will be surprised and impressed by the degree of automation Django provides. If you have worked with Ruby on Rails, you probably will think the Django programmers are working too hard—to which Django hackers would say that they want to have full control over their application, rather than rely on behind-the-scenes magic.

Creating a Model

The term model in the Django world describes a Python object for which there is a persistent state, presumably stored in a relational database. We don't need to use models to integrate a database into Django, but it would be difficult (not to mention unaesthetic) if we were simply to stick SQL queries into our templates. So instead, we use Django's built-in object-relational mapper, working solely with objects from within our views and templates. The mapper's job is to translate our method calls into SQL and then translate the resulting database response into Python objects.

But, before we even can create our model object, we first must have a database table to which the object will connect. Django requires that we define our model using Python code, describing the table's name, fields and even some default values.

If we were interested in keeping the blog application we started last month, we probably could define our table in PostgreSQL as follows:

CREATE TABLE Posting (
id        SERIAL    NOT NULL,
title    TEXT    NOT NULL,
body    TEXT    NOT NULL,
posted_at    TIMESTAMP NOT NULL DEFAULT NOW(),

PRIMARY KEY(id)
);

But in Django, we don't create the above SQL directly. Rather, we use Python to create it for us. For example, we can define the above table in Django as follows:

from django.db import models

class Posting(models.Model):
    title = models.CharField(maxlength=30)
    body = models.TextField()
    publication_date = models.DateTimeField()

Our model is a Python class, which inherits from django.db.models.Model. We define each field with a particular type, using objects that we imported from django.db.models. As shown above, some of these data types can be restricted or modified from their defaults by passing parameters. Some are defined specifically because they have built-in limits, such as EmailField, which must be a valid e-mail address. By defining columns with ManyToManyField and ForeignKey objects, it's possible to define a variety of relationships among tables.

The above code should be placed in models.py, a Python file that sits within our application's directory (blog in this case), which itself sits inside our Django site directory (mysite in this case). Thus, the models for my blog application reside in mysite/blog/models.py, whereas the models for a poll application would reside in mysite/poll/models.py.

Notice that we don't have to define a primary key, which traditionally is called id and is a nonrepeating integer. (In PostgreSQL, we set it to have a SERIAL data type, which gives the column a default value taken from a newly created sequence object. In MySQL, you would set the column to AUTO_INCREMENT, which has some of the same capabilities as a sequence.) Django creates the id column for us automatically. Django handles potential namespace conflicts by prefacing the table name with the application name. So, the posting table within the blog application becomes the blog_posting table.

Now, how do we turn our Python code into SQL? First, we have to be sure Django knows which database to use. If you have been following along since my first Django article in the July 2007 issue, you already have added the appropriate lines to settings.py, a site-wide configuration file in which we define the database type, name, user and password. Here are the values that I have installed:

DATABASE_ENGINE = 'postgresql'
DATABASE_NAME = 'atf'
DATABASE_USER = 'reuven'
DATABASE_PASSWORD = ''
DATABASE_HOST = ''
DATABASE_PORT = '5433'

It's also important to check that the application is defined in INSTALLED_APPS, a tuple of strings. On my system, INSTALLED_APPS looks like this:

INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.admin',
'mysite.blog'
 )

Notice the clear namespace distinction between my application (mysite.blog) and the applications that are included with Django (django.contrib.*).

Before we turn our Python code into SQL, we first should check to make sure it passes some basic sanity and validation checks. To do that, we go to our site's home directory, and type:

python manage.py validate

If all goes well, Django will report that there aren't any errors. Now that our model has been validated, we can use it to create SQL. The easiest way to do this is with the sqlall command to manage.py:

python manage.py sqlall blog

This produces the SQL output for our database driver (PostgreSQL, in this case). For example, this is the output that I see on my system:

BEGIN;
CREATE TABLE "blog_posting" (
    "id" serial NOT NULL PRIMARY KEY,
    "title" varchar(30) NOT NULL,
    "body" text NOT NULL,
    "publication_date" timestamp with time zone NOT NULL
);
COMMIT;

To their credit, the Django developers wrap the CREATE TABLE statement between BEGIN and COMMIT, ensuring that the table creation will take place in a transaction and will be rolled back if there is a problem. This isn't an issue when creating only one table, but if we have several models, it's always best to leave the database in a consistent state.

One way for us to use the output from sqlall to create tables is to copy it from the terminal window and then paste it, either into a file or into the psql client program. But, Django provides the syncdb utility to do this for us:

python manage.py syncdb

The output from this reassures us that all is well:

Creating table blog_posting
Loading 'initial_data' fixtures...
No fixtures found.

And, sure enough, now we can see that our table has been added:

atf=# \d blog_posting
 id               | integer               | not null
                default nextval('blog_posting_id_seq'::regclass)
 title            | character varying(30) | not null
 body             | text                  | not null
 publication_date |  timestamp with time zone | not null
Indexes:
"blog_posting_pkey" PRIMARY KEY, btree (id)

Voilà! We now have a model that we can access via Python methods, but that exists in our relational database.

Inserting Data

Now that our data model is in place, let's see how we can work with it. Given that our model is brand new, and that there is no data currently stored in it, let's begin by adding some data to it.

In last month's column, we saw how each URL request in Django results in the invocation of a method. Which method is invoked depends on the settings of urls.py, a site-wide configuration file that tells Django what application and method should be associated with what URL.

One way to add data to our blog database, and to get some practice working with the various components of Django, is to do so via a view and template. Normally, I would demonstrate how to do this with an HTML form, but for space reasons, I use a simpler (and more contrived) way, inserting dummy data into the database.

The first step is to add a new line to the definition of the urlpatterns variable, defined in urls.py:

('^blog/add_dummy_data', 'mysite.blog.add_dummy_data')

Now, we can go to the URL /blog/add_dummy_data, and Django will invoke the blog.add_dummy_data method. The beginning of this method is quite simple, namely:

def add_dummy_data(request):

The name of the method is obvious from the configuration file. The number of parameters is determined by the number of parenthesized groups in urlpatterns.

Now what do we do? If we were dealing with raw SQL, I would suggest the following:

INSERT INTO Posting
    (title, body, posted_at)
VALUES
    ('Dummy 1 headline', 'This is my first blog post',
        NOW - interval '1 hour');

INSERT INTO Posting
    (title, body, posted_at)
VALUES
    ('Dummy 2 headline', 'This is my second blog post',
        NOW());

These will insert two rows into the Posting file: the first with a timestamp from one hour ago and the second with a current timestamp.

But, we don't want to use SQL. We want to use Python, creating objects that automatically map to these INSERT statements. So, it makes sense that all we have to do is create new instances of the Posting object, passing it appropriate parameters. And, sure enough, we can do that:

p = Posting(title='Dummy 1 headline', body='This is my 
first blog post', posted_at=(datetime.now() 
- timedelta(0, 0, 0, 0, 1)))
p.save()

p = Posting(title='Dummy 2 headline', body='This is my 
second blog post', posted_at=datetime.now())
p.save()

If you are an experienced Python programmer, the above code shouldn't be very surprising at all. We simply are creating two new instances of Posting, passing arguments that will set the object's attributes. Then, we invoke the save() method on each posting, which presumably saves the posting to disk.

Finally, we finish our method with:

return HttpResponse("Created blog posts.")

With the method (shown in Listing 1) defined, start up the server:

python manage.py runserver  69.55.232.87:8000

Then, point the Web browser to the URL defined in urls.py, and get the message:

Created blog posts.

Next, check the database, just to be sure:

atf=# \x
Expanded display is on.

atf=# select * from blog_posting;

-[ RECORD 1 ]----+------------------------------
id               | 1
title            | Dummy 1 headline
body             | This is my first blog post
publication_date | 2007-06-15 16:13:34.609396-05

-[ RECORD 2 ]----+------------------------------
id               | 2
title            | Dummy 2 headline
body             | This is my second blog post
publication_date | 2007-06-15 16:14:34.675235-05

As you can see, we were able to create these new objects successfully and store them in the database.

Retrieving Data

Now that we've created these objects, let's see if we can retrieve and display them—a pretty typical thing to do if you write applications in Django. Because the most common thing you might want to do with a blog is display all of the postings in reverse chronological order, we write our index method to do that. If you still don't have an entry in urls.py for index, make sure there is a line that looks like the following in the definition of urlpatterns:

(r'^blog/$', 'mysite.blog.views.index'),

Now, we open up views.py to create our index method. The first task in that method is to get all the postings. Django makes that trivially easy to do:

postings = Posting.objects.all()

This retrieves all the instances of Posting (which happen to be stored as rows in our database) and assigns them to the variable postings. This variable isn't a list, but an instance of a QuerySet object. We most likely will want to iterate over the QuerySet, but we can perform other operations on it, such as reordering it or retrieving selected elements.

We also can select particular items from the database. This is done with two methods: one called filter (which returns objects that match a restrictive function) and one called except (which does the opposite, returning objects that are false for a function). Both filter and except take a large number of parameters, built up dynamically by joining column names with various functions. The column name and function name are joined with a double underscore (__).

For example, we can get only those postings from this year:

this_year_postings = Posting.objects.filter(
    publication_date__gte=datetime(2007, 1, 1))

Sure enough, this returns both of our postings. Because filter and except return QuerySet objects, we can chain them together, creating just the query we want in Python code.

But, what if we want only the most recent posting? If you're thinking there will be a “limit” feature, you've been working at the SQL level (or in Rails) for too long. Because QuerySets use lazy evaluation, you simply can say:

this_year_postings = Posting.objects.filter(
    publication_date__gte=datetime(2007, 1, 1))[0]

We similarly can order our objects by using the order_by method on them, which can be chained along with filter and exclude:

latest_posting = Posting.objects.filter(
    publication_date__gte=datetime(2007, 1, 
1)).order_by('-publication_date')[0]

Notice that we put a minus sign (-) before the word publication_date. This tells Django we want to order the results in reverse.

Django has a wealth of such methods, giving both a great deal of flexibility in constructing your queries and a rich Python API that allows you to ignore the low-level SQL calls almost entirely.

Finally, we can get information out of our object as we would retrieve it from any Python object:


output += "<h1>%s</h1>\n" % posting.title
output += "<h2>%s</h2>\n" % posting.publication_date.isoformat()
output += "<p>%s</p>\n\n\n" % posting.body

If we put this all together, as shown in Listing 2, we'll have a view method (albeit without a proper Django template) that shows each of the blog postings.

You can try all of these database queries for yourself using Django's shell:

python manage.py shell

Using the Django shell, as opposed to the straight interactive Python interface, ensures that Django-related classes and paths are preloaded, making it possible to query and modify the database from within Python interactively. This is a good way to experiment with new code that you are thinking of adding to a view method, without having to place it in a file.

Conclusion

Django provides a high-level interface for the definition of database models using Python, rather than SQL. This high-level API permeates the framework, making it possible to work exclusively in Python. Moreover, the API includes many convenience functions and data types that make it relatively natural to work in this way. Creating database-backed Web applications with Django is dramatically easier and better than with most frameworks I've used, although it is similar in style to Ruby on Rails. Whether you should use Django or Rails is a matter of personal taste and also depends on what others in your organization are using, but there's no doubt that if you're a Python Web/database hacker, Django is worth a very serious look.

Reuven M. Lerner, a longtime Web/database consultant, is a PhD candidate in Learning Sciences at Northwestern University in Evanston, Illinois. He currently lives with his wife and three children in Skokie, Illinois. You can read his Weblog at altneuland.lerner.co.il.