Content Management

Reuven M. Lerner

Issue #108, April 2003

Give your web site newspaper-grade content management with open-source software.

Remember the good ol' days of the Web? Back when a webmaster was a jack-of-all-trades, doing everything from graphic design to database programming to DNS table manipulation? As the Web matured, however, one-person web sites became increasingly rare. True, it's still relatively easy for a someone to create and maintain a simple web site, but even the smallest organizations typically split responsibility between programmers, designers and the people who provide content. Moreover, many organizations want different people to be responsible for different types of content, with each having ultimate authority over a particular section.

Of course, this is old hat to the publishing world. Back when I edited my college newspaper, we used a composition and typesetting system called Atex. Atex was beloved for many reasons but mostly because it worked the way that newspapers do. Reporters using an Atex system would send articles to their editors by pressing the Send button on a massive, specialized keyboard. Editors could look at the list of articles awaiting their attention, edit articles, send an article back to the reporter who wrote it or send articles onto the Typesetting department. By design, everyone was forbidden from viewing, modifying or retrieving articles they had sent to the next person in the process chain. The image of a reporter shouting “stop the presses!” might be romantic and inspiring, but it is also unrealistic in today's world, where newspapers are businesses with tight deadlines.

As web sites grow to resemble newspapers, we should not be surprised to see them adopting software—known as content management systems, or CMS—that works much like Atex used to do. But organizing documents, people and work flow is a difficult task, particularly if you try to put everyone's needs into a single software package. So even though content management is crucial to an increasing number of web sites, CMS salespeople have gained a reputation for selling bloated, expensive software that is large on promises and small on delivery.

What Does a CMS Do?

One of the problems with content management is that every web site has different needs. For this reason, proprietary CMS software usually is sold in two parts. The customer first pays for the basic software and then pays at least as much in consulting and support services. Thus, CMS software is not only expensive but requires a fair amount of implementation and testing time. In other words, a CMS usually is closer to a toolkit than a finished application. Most of these toolkits include the following functionality:

  • Users: if everyone on a web site is going to be given different permissions, obviously each user will need a different login. A CMS thus comes with user-management software, allowing you to create, delete, edit and ban users on the system. Most systems also make it possible for users to retrieve forgotten passwords.

  • Permissions: just as Linux allows you to set read, write and execute permissions on different files, CMS software typically allows the site administrator to define different permissions for each user on the system. To return to our newspaper analogy, reporters are allowed to enter content, editors can modify content (or return it to a reporter) and publishers can make content publicly available (or return it to an editor).

  • Groups: although you theoretically can assign permissions to individual users, this quickly becomes tedious. So, most CMS software allows you to group users together for the sake of assigning permissions. For example, you can indicate that Tom, Dick and Harry can write and edit but not publish, or you can assign these permissions to the Canonical Names group, with the same effect.

  • Templates: many templating systems exist, including JSP, HTML::Mason and PHP. The best ones separate design, content and programming logic from each other, so designers, writers and programmers can work on a site simultaneously without stepping on one another's toes.

  • Publishing: the Web's biggest double-edged sword is its instantaneousness. The moment you modify foo.html on your server, everyone can see what changes you made. What if you made a mistake? What if you want to test the file beforehand? The CMS solution is to mark each piece of content as published when it should be viewed by the outside world. Until an article has been published, it is invisible.

  • Staging or previewing: just as newspaper and magazine publishers want to see what the finished product will look like before they begin to print actual copies, web publishers want to preview their site before it is live on the Web. Thus, many sites run staging servers, identical in most ways to their production servers except they are hidden from the outside world. Testing is done on these preview servers; when the editor or publisher is satisfied, content is pushed to the production servers. A CMS almost certainly will allow you to set up your system in this way.

  • Work flow: staging is the final step in what might be a long journey from an author's workstation to a production web server. How content makes its way through the system is known as work flow, and much of what a CMS does is allow you to define and manage that work flow. Should reporters be allowed to yank stories back from their editors? How many levels of editors do you want? Where do designers fit in? Who gives the final send-off to content? All of these questions are handled by the work-flow portion of a CMS.

  • Publishing dates: the good news about the Web is that things are published instantaneously. But what if your corporation is announcing a stock split and cannot reveal that information until 9:00 AM on Monday? You could sit next to the computer, waiting until the clock strikes 9:00 to press the Enter key and revealing the document for everyone to see. Or you can use a CMS, which typically allows you to specify when an article will appear, as well as when it should expire.

  • Web-based editing: although a web browser is one of the worst possible programs to use for serious text editing, most CMS systems allow you to write some or all of your documents using your browser. To be fair, just about every CMS also lets you upload files from your local computer. Web-based editing comes in handy when you're on the run or want to touch up one or two things. Of course, any CMS that offers such editing facilities also checks that someone trying to edit a page is authorized to do so.

  • Search: most CMS packages offer some sort of search facility, so you can find documents within the system.

Although this list is by no means exhaustive, it should give you a sense of the types of problems that a CMS tries to solve. But as you can imagine, every CMS offers a slightly different set of features and different ways of attacking these problems.

Because a CMS spends much of its time storing, retrieving and tracking content, it should come as no surprise that a database is almost essential to a CMS. Commercial CMS packages typically expect you to use a proprietary database system, such as Oracle or Microsoft's SQL Server. As you might expect, open-source CMS software generally is designed to work best with open-source databases, such as MySQL or PostgreSQL. Zope's Content Management Framework (CMF), which is a toolkit for creating a custom CMS, also uses a database, but in this case, it's the built-in Zope Object Database (ZODB) rather than an external relational database.

Content Management vs. Application Development

If you have ever developed serious web applications, you immediately will see a large degree of overlap between the features a CMS offers and the features you expect from a web application server. Most CMS software sits on top of a web application server, using its underlying infrastructure to handle HTTP connectivity, users, groups, permissions and even the database API. In some ways, CMS was the first popular class of application to be deployed on the Web, much as spreadsheets were the first applications used on personal computers.

Overall, it's a good thing CMS software is written on top of an application server, especially in the open-source world. This means you can add new modules to the core CMS, handle new types of documents, change the templates, extend the database and add new types of permissions and work-flow rules. But it's important to remember the difference between an application server and a CMS. The former provides the infrastructure for creating applications, and the latter is an application you can customize.

So if you're looking to create a web-based newspaper, magazine or corporate news site, a CMS is undoubtedly the right type of software for you. But if you want to create a web-based application that tracks donations to your favorite charity, a CMS probably won't provide the flexibility you need. The difference between web applications and web publications has always been a murky one, but as web applications become increasingly sophisticated, CMS software will be recognized as one type of product you can run on a web platform.

Because content management systems normally run on top of an application server, your choice of CMS might depend on the type of server on which it runs. Many companies have moved to J2EE (Java 2 Enterprise Edition) as their underlying platform. Indeed, the well-known Vignette CMS originally was designed to work with Tcl but migrated to J2EE when the buzz surrounding J2EE became too great to ignore. Because J2EE is a standard, rather than a product, customers can choose application servers and CMS software separately. You can use the open-source Tomcat/JBoss duo or the proprietary offerings from companies like BEA or IBM.

If you dislike Java, or if your development team is more familiar with another set of technologies, you might consider a non-J2EE CMS. Such products do exist, and we will look at several of them in the coming months, such as Zope's CMF, the CMF-based Plone, Bricolage (Perl/PostgreSQL), PHPNuke/PostNuke/Xoops (PHP) and Midgard (PHP).

Regardless of what technology you decide to use, a CMS is increasingly necessary and useful for producing web sites. Even if you're the only person working on your web site, moving to a CMS is probably a wise move, if only to help standardize the look, feel and delivery of content on your site. And, if you ever decide to add new types of content, the CMS will probably be able to handle it, though you might need to tinker with it somewhat.

Conclusion

CMS software is probably the first type of application designed for the Web. Most content management solutions are expensive and proprietary, but an increasing number of open-source options are available for those who want greater freedom and lower cost. Given that content management systems normally need a great deal of customizing and tuning, this is another niche for which open-source tools are an excellent fit.

Resources

Reuven M. Lerner (reuven@lerner.co.il) is a consultant specializing in open-source web/database technologies. He and his wife Shira recently celebrated the birth of their second daughter, Shikma Bruria. Reuven's book Core Perl was published by Prentice Hall in early 2002, and a second book about open-source web technologies will be published by Apress in 2003.