Content Management

Reuven M. Lerner

Issue #77, September 2000

Keep track of updates to your web site documents with this Mason application.

Over the last few months, we have looked at Mason, the web development system written by Jonathan Swartz that combines Apache, mod_perl and HTML/Perl templates. Normally, Mason is associated with web applications, especially those that use back-end databases and produce dynamic content. Indeed, I have found Mason to be an extremely useful tool in my consulting practice, making it possible to create web sites quickly and easily.

But Mason can be used for more than simple server-side web applications. One of the most interesting Mason applications is a set of components known as the “Mason content manager” or “Mason-CM”. Mason-CM provides basic content management functionality, beginning with its ability to handle the staging of files to a production server, and continuing with built-in support for spell checking, RCS version control and editing of Mason components.

While Mason-CM is written in Mason and thus requires Apache, mod_perl and the appropriate Perl modules, it can work just fine with a static site using nothing but HTML and graphic images. In fact, I recommend that even site owners uninterested in Mason and mod_perl take a look at the Mason content manager, because it provides so many useful features in a simple, free package.

Why Content Management?

As web sites have become increasingly complex, so too have the organizations surrounding them. Whereas it used to be common for a professional site to be a one-person operation, it is now usual for even a small or medium-sized site to have at least three people: a writer/editor, a designer and a programmer. Even such a small team will eventually find multiple members trying to edit the same file simultaneously. This problem was solved years ago by version control systems, such as RCS and CVS—but the tools were designed by and for programmers, and can often be daunting for a designer or editor to use.

Moreover, the Web presents a number of challenges that are different from development in the traditional software world. For example, software is traditionally written, compiled, tested and debugged, at which point the cycle begins again until the software is released. The Web, however, works differently. As soon as an HTML file in the web document hierarchy is edited, it is immediately available to everyone on the Internet. This is good news when someone finds a mistake needing to be fixed quickly, and also means that sites can update their content on a regular basis without having to go through long processes. However, it also means that the results of editing a file—whether they are improvements or mistakes—are immediately available to anyone who happens to enter the right URL at the wrong time.

For all of these reasons, most medium- and large-scale web sites now run two web servers. The first, known as a “staging server”, is where the developers, editors and designers can write, edit and test their changes to the site. Only after files have been fully tested are they sent to the second web server, known as the “production server”.

Of course, these two servers need not be on separate computers. They merely need to be separate in some way, to keep outsiders from seeing the content as it is tested and to ensure that the staging and production servers have parallel directory structures.

Mason-CM is a set of Mason components that makes it relatively easy for anyone to set up a content management system on their own server. The user interface is not beautiful, and I had some minor problems getting it installed on my system. However, it does the job more than adequately, and makes it possible for multiple people to work on a project without stepping on each other's toes.

Installing Mason-CM

Before you can install Mason-CM, you will need to have a working copy of Apache with mod_perl and Mason installed. (If you need help installing Mason, see the last three months of “At the Forge”.) In addition, you will need to download and install several Perl modules from CPAN, including MLDBM, Image::Size, URI::Escape and File::PathConvert. Make sure to import these in your Apache configuration file (httpd.conf) with the PerlModule directive, so as to increase the amount of memory shared among the various Apache child processes.

Mason-CM comes as a gzipped tar file from the Mason home page (see Resources). The archive should be unpacked into a directory under your Mason component root. I chose to put it in /usr/local/apache/mason/cm, where /usr/local/apache/mason is my Mason component root directory. If you have unpacked the archive correctly, the /cm directory should contain a README and an INSTALL file. I will cover all of the steps needed for installation, but it is probably a good idea to read through those files just in case.

Because Mason-CM must ensure that each file is being edited by only one user at a time, and because a content-management system should be available to authorized users only, you must restrict access to the /cm directory using HTTP authentication. Mason-CM will refuse to run unless it is in a directory that has been password protected.

The easiest way to password-protect a directory is to create a .htaccess file in it. The .htaccess file overrides the default Apache configuration settings for its directory, as well as any subdirectories underneath it. For example, here is the .htaccess file for my content management system:

AuthName "Content Management System"
AuthType Basic
AuthUserFile /usr/local/apache/conf/staging-passwords
require valid-user

The AuthName directive defines the string that will be displayed to users when prompted for a password. (Without such a string, it might be hard for users to remember which system is requesting a user name and password.) The AuthUserFile directive should point to a file containing user names and encrypted passwords.

The password file should be outside of the web document root directory, so that users cannot retrieve it using their web browsers. To create or edit the password file, you will need to use the “htpasswd” program, which is installed by default in /usr/local/apache/bin.

The “require valid-user” tells Apache that without a user name and password that match the entries in AuthUserFile, the user will be denied access to the directory.

If the .htaccess file does not have any effect, check the AllowOverride directive in httpd.conf. This directive indicates which Apache configuration options can be overridden by an .htaccess file. By default, Apache servers are configured not to let .htaccess files modify directives of type AuthConfig. You can change this by putting the following section in httpd.conf:

<Directory /usr/local/apache/mason/cm>
AllowOverride AuthConfig
</Directory>

Finally, I found that there was a small bug in my installation of Mason-CM. Many of the components lack a .html suffix, making it difficult or impossible for Apache to identify the content as “text/html”. So even though Mason-CM components produced output in HTML-formatted text, the contents were interpreted by my browser as if they were in unformatted ASCII. To give Apache some help, I explicitly set the content type using the mod_perl “content_type” method. The following autohandler, placed inside of the /cm directory, automatically sets the content type to “text/html” for each document within the directory:

<% $m->call_next %>
<%init<
$r->content_type("text/html");
</%init>
Because my mason.pl configuration file is set to ignore non-text files, I can be sure that the above will not accidentally force JPEG and PNG images to be rendered with a type of “text/html”.

Configuration

Mason-CM is now in place. However, we need Apache to load a number of Perl modules in order for it to work. In your Mason configuration file (which I call “mason.pl”, but the Mason documentation calls it “handler.pl”), insert the following block of Perl code:

{
   package HTML::Mason::Commands;
   use Fcntl;
   use MLDBM;<\n>
   use Image::Size;
   use URI::Escape;
   use File::PathConvert;
   use File::Copy;
   use File::Find;
   use IO::Handle;
   use IPC::Open2;
}

Now that we have told Apache and Mason where to find Mason-CM, it is time to configure Mason-CM itself. Nearly all of the configuration is performed by modifying the cmConfig component, located in the /cm directory. As of this writing, cmConfig is written using the old-style Mason interface, which may seem a bit foreign to those of us who started using Mason with version 0.80. For example, the initialization block is called <%perl_init> rather than simply <%init>, and one component invokes another with mc_comp rather than $m->comp. Nevertheless, the component should be relatively easy to recognize and understand by anyone with even a minimum amount of experience with Mason.

The two main variables that must be set at the top of cmConfig are $CM_HOME and $CM_DATA. (These are defined on line 25 of the default version of cmConfig, at the top of the <%perl_init> section.) The first refers to the directory in which Mason-CM is installed. The second refers to the directory in which Mason-CM can store information on the files it manages, such as locking and version control information. On my system, I defined them as follows:

my $CM_HOME = '/usr/local/apache/mason/cm';
my $CM_DATA = '/usr/local/apache/cmdata';

Both of these directories must exist in order for Mason-CM to work. While $CM_HOME should already be defined (since cmContent is supposed to be inside of $CM_HOME), you may need to create the $CM_DATA directory. Note that this directory is different from the Mason data directory, which I typically put in /usr/local/apache/masondata.

Following the definitions of $CM_HOME and $CM_DATA is a large hash, called %cm_config. The keys in %cm_config describe different configuration options, and the values are the settings for those options. In most cases, the default options are probably adequate; we will discuss only those options that you must or should change.

The “admin” key refers to the e-mail address of the Mason-CM administrator. The administrator is responsible for undeleting files, unlocking locked files and generally managing the content management system. By default, this is set to be cm-admin, but you can change the value to something else.

Defining Branches

The value associated with the “branches” key is an array reference describing the various “branches” on the version control system. While all of the branches must be under $CM_HOME, this makes it possible to differentiate between subsites. For example, a newspaper might have separate branches for the news, sports and business sections. Each branch is identified by a unique name, followed by a hash reference identifying different characteristics associated with the branch. For example, here is what the “branches” key looks like for a web site with a single branch, called “Primary”:

branches => [
      Primary => {
          path =>'/usr/local/apache/htdocs/staging/content',
          trg_from => 'staging',
          trg_to => 'production',
          components => 0
      }
   ]

The above branch will be displayed in the Mason-CM “branch selector” as “Primary”, and controls all of the documents under /usr/local/apache/htdocs/staging/content. Make sure the named directory does not end with a “/”, or Mason-CM will fail with a security violation.

The “trg_from” and “trg_to” keys are used in a simple substitution, indicating that to move documents from the staging server to the production server, we replace the string “staging” with the string “production”. (Mason-CM calls the staging process “triggering”.) Thus content is initially placed in /usr/local/apache/htdocs/staging/content, and is staged to the directory /usr/local/apache/htdocs/production/content. Finally, we indicate that this branch contains static HTML (rather than Mason components) by setting the “components” key to 0.

A more complicated site might set branches to the following value:

branches => [
        News => {
            path => '/usr/local/apache/htdocs/staging/news,
            trg_from => 'staging',
            trg_to => 'production',
            components => 1,
            hidden => 1
        },
        Business => {
            path => '/usr/local/apache/htdocs/staging/business,
            trg_from => 'staging',
            trg_to => 'production',
            components => 1,
            obj_dir => '/usr/local/apache/staging/obj',
            hidden => 1
}
]

The above Mason-CM configuration has two branches, known as “News” and “Business”. Because “branches” is an array reference rather than a hash reference, its elements are kept in their original order. This means the branch selector will display the branches in the order they are entered into branches above. Changing the order in which branches are displayed is as easy as modifying the order of elements in the branches array reference.

If we set up the branches using the above system, we can then modify our Apache configuration such that it takes any URL beginning with /news and rewrites it as /production/news:

Alias /news /usr/local/apache/htdocs/production/news

Now the staging server is hidden from view via a web browser. We can, however, configure our web server such that all requests to port 8080, or any other port we choose, are directed toward the staging server.

The “hidden” tag indicates whether the branch will be displayed by default in the branch selector. Normally, all branches are displayed by default, and are available to all users. And any user can customize the list of branches using the “my.CM” link in the upper right-hand corner of the Mason-CM index page, adding and removing branches from his or her menu. However, making a branch hidden by default gives new users a relatively clean view of the content management system.

Unlike the “Primary” branch, “News” and “Business” are defined to contain Mason components. Staging a component is different from staging a static page of HTML, in that Mason-CM will try to compile the component and test it for errors before actually allowing it on the production server. In this way, a broken component will not cause the production web site to fail, but rather only the staging server. If we want to store the compiled Mason components in a specific directory, we can specify that with the “obj_dir” key.

cmConfig can be modified in many other ways to customize your Mason-CM. However, once you have configured $CM_HOME, $CM_DATA and branches, you can begin to use Mason-CM.

The Index Page

To access the main Mason-CM interface, point your web browser at $CM_HOME. On my system, I opened the URL http://localhost/mason/cm/.

Because this directory is password-protected, I had to enter a user name and password. Following a successful login, I was presented with the main Mason-CM index screen. To a large degree, the index page is a web-based file browser, allowing you to navigate through the directories and subdirectories in the various defined branches, open files for reading and writing, and search for a file by file name or content.

The index screen is easily identified by the picture of a juggler. While you can replace this with any image you prefer (setting the “juggler_src” key in cmConfig), the image seems rather appropriate for those of us who work on web sites! Clicking on this image from anywhere in Mason-CM brings you back to this main index page.

Along the right-hand side of the index page is the branch selector, listing the branches that were defined in cmConfig. Clicking on a link within the branch selector allows you to handle staging for that particular branch. The current branch appears in a slightly different background color from the other branches, so your current location should always be fairly obvious.

The current directory is identified in the middle of the screen, with the “current directory” headline (and a blue default background). Each component of the current directory path is a hyperlink to that path, making it possible to navigate using the mouse. To switch into a subdirectory, merely click on its name. Alternately, you can create a new subdirectory by using the text field in the middle of the screen.

Above the “current directory” line is a search system. I am obviously not the only person who has ever reverted to grep and find after failing to remember where a particular file is stored on a web site. Mason-CM puts both of these programs into an easy-to-understand package, allowing even non-UNIX users to search for files within the current branch. The search supports Perl regular expressions, meaning you can look for files by name or by content in a variety of ways. Be careful about what you search for, however; Mason-CM will happily search through hundreds of files for a complex regular expression, even if the execution will take a long time.

Beneath the “current directory” line is a list of files available within the current directory. Each file is identified by name, by its last modification date, by the person who performed the last modification and by the file's current status. The status is one of “staging” (meaning it exists on the staging server only), “prod” (the file is identical on the staging and production servers) and “modified” (the file exists on both servers, but has been changed on the saving server).

You can also create a new file, using a text field and the “create” button, just before the list of existing files. Do not confuse the subdirectory create button with the file create button; I modified the button definitions in the “dirTable” and “fileTable” components, so that they say “create subdirectory” and “create file”, respectively.

Viewing and Editing Files

To view the current version of a file, click on its name in the file table. The HTML source will be displayed at the top of the browser window. At the bottom of the page, you can ask to see a rendering of the HTML, to wrap text after 80 columns (rather than displaying the text verbatim), or to display line numbers along with the HTML source.

You can also edit files from within Mason-CM, using a primitive but functional text editor. To edit a file, click on the “edit” link next to the file name. This will bring up a <textarea> widget containing the file's contents. You can modify the contents of the file by typing into the <textarea> field, and can even copy or rename the file using the text field at the top of the page.

This editor is nearly as primitive as things can get, with a barely functioning set of Emacs key bindings for cursor movement. However, the fact that it lets you make modifications easily and quickly is certainly an advantage. And it is integrated into the rest of Mason-CM, in a format that most designers and editors can understand comfortably.

From the editing screen, you can choose from a number of options:

  • The “save” button updates the file, and returns to the editing screen.

  • The “save and exit” button saves the file to disk, and returns to the main Mason-CM index page.

  • The “save and render” button displays the HTML output produced by the file, and can be used to preview the way a particular page or Mason component will work.

  • Finally, the redraw button at the bottom of the screen makes it possible to resize the <textarea> widget, adjusting its height and width.

Mason-CM uses a locking mechanism to ensure that only one user can edit a file at a given time. If you are editing a file, it is noted inside a red box at the top of the index page. That box lists the files on which you're currently working, offering links to the file editor and to the “unlock” page.

If you try to edit a file that someone else is editing, Mason-CM will refuse to display the editing screen. Once the file is unlocked another user will be able to modify it again.

Staging Files

Once a file appears to work correctly on the staging server, you must move it to the production server. To do this, select one or more files within a directory with the checkboxes on the left-hand side of the file table. Then, click on the “trigger” button at the bottom of the page. The files will be copied over to the production server, instantly making them the “current” copies of the web site.

You can trigger all the files in a directory by clicking on the “check all” checkbox at the bottom of the page, next to the trigger button. This is particularly useful when you have created a new directory and want to use all of the items at once.

If someone happens to modify the production version of a file, it is possible to “reverse-trigger” the file. This copies the file from the production server onto the staging server. This is a potentially dangerous operation, and should not be treated lightly; as a result, Mason-CM asks for explicit confirmation before allowing such an operation.

Spell Checking

Once you have the basic Mason-CM functionality working, you may want to try some of the optional features that it includes. Perhaps the most interesting feature is the spell checker, which is a Mason component that uses ispell to check the spelling of the document. The Mason spell checker ignores HTML tags, so you need not worry about having to add “href” to the dictionary.

To enable spell checking, uncomment the “ispell”, “main_dict” and “supp_dict” keys in the %cm_config, defined in cmConfig. They are commented out by default; on my Linux system, I was able to uncomment them without making any modifications:

ispell      => '/usr/bin/ispell',
main_dict => '/usr/lib/ispell/english.hash',
supp_dict => "$CM_DATA/suppDict",

Once you have defined these keys, the Mason-CM editor will automatically include a “spell check” checkbox. This means that every time you click on “save”, “save and render” or “save and exit,” the document will be spell-checked. If a word in the document is misspelled, a small JavaScript program allows you to choose an alternate, ignore the misspelling, or add the word to a dictionary. Everyone on a Mason-CM system shares a dictionary, meaning that if one user adds a word to the dictionary, everyone else will gain from it. (This also means that if one user accidentally adds a misspelled word to the dictionary, everyone will suffer, so be careful!)

Version Control

Mason-CM also supports the use of RCS for version control. This requires mason.pl to import the RCS module along with Image::Size, URI::Escape, and File::PathConvert. Following that, define (or uncomment) the following lines from cmConfig:

rcs_bin => "/usr/bin",
rcs_files => "$CM_DATA/archive",

Once Mason-CM sees that these values are defined, it adds a “version label” text field to the top of the editing page. If you enter a version label when the document is saved to disk, then RCS will automatically be used to keep both the older version of the file and the newer one.

Moreover, activating version control means that the file list will include a “versions” label. Clicking on this brings up a list of the document version history, and provides a nice interface to diff and the ability to check out older versions. Version control is almost a necessity when working on larger web sites, since bugs can creep in almost anytime, and it's often more important to use a stable, older version than an unstable, newer version with more features.

In addition to spell checking and RCS, Mason-CM includes a number of other features: users can upload files via HTTP and FTP, and administrators can restrict user access on a per-directory basis. Because Mason is written in a straightforward dialect of Perl, it shouldn't be difficult to add other features, such as the ability to stage to other computers (rather than other directories) and HTML validation before staging.

Conclusion

Mason may be a powerful tool for creating web sites, but Mason-CM displays the versatility of this tool. Mason-CM demonstrates that Mason components may be used to create a tool that doesn't directly affect the content produced on the Web. I am very impressed with the variety of tools Mason-CM offers, and while I won't be giving up GNU Emacs as my editor of choice in the near future, I do expect to use Mason-CM on a number of my clients' sites—both those that use Mason for content generation, and those that use simpler, less-advanced tools.

Resources

Reuven M. Lerner (reuven@lerner.co.il) owns a consulting firm specializing in web and Internet technologies, based in Modi'in, Israel. As you read this, he should (finally) be done writing Core Perl, to be published by Prentice-Hall. You can reach him via e-mail at reuven@lerner.co.il, or at the ATF home page, http://www.lerner.co.il/atf/.