At the Forge

Integrating E-mail

Reuven M. Lerner

Issue #116, December 2003

Keep users coming back to your Web site with e-mail reminders about news or discussions that interest them.

I have been using my computer to communicate with other people for more than 20 years. What began as occasional participation in forums on local bulletin boards has become an inseparable part of my personal and professional lives. The daily flood of e-mail I receive from friends, relatives and colleagues often seems overwhelming—until I spend time without access to e-mail, at which point I realize exactly how important and useful it is.

This month, as I continue to unpack at my new home in Chicago and fight the bugs and glitches that make it difficult for me to continue discussing Bricolage, I look at a number of different issues having to do with e-mail in the modern era—Web/mail integration, mail/database integration and even fighting spam at the SMTP level.

Integrating the Web and E-mail

During the last two years, at least a dozen clients have asked me to set up Web-based forums. Indeed, it is rare to find a large, modern Web site that does not include a section for user feedback or participation. The question is, how do you incorporate forums into your site? In many cases, the answer depends on the type of site you're running.

If you're using a large toolkit, such as OpenACS, Zope or PHPNuke, at least one forum package is available for easy installation into your site. Not only is the look and feel of it closely integrated into the static pages and other applications, but these toolkits allow the same users and permissions you have set on the rest of the site to have access to the forums. In other words, you don't have to make someone a site administrator and a forum administrator separately; anyone with administrative privileges on the site is able to run the forums or any other application without further configuration.

Alternatively, you can install a separate Web forums package using the server-side technologies available on your server. For example, if PHP is available, you can install the Phorum package. It won't be integrated completely into the rest of your site, but Phorum is a powerful and stable package and works with both MySQL and PostgreSQL. There are many options from which to choose, with underlying technologies ranging from PHP and JSP to plain-old CGI programs written in Perl.

You also can go the proprietary route and license a package such as WebCrossing. I've used WebCrossing for a few clients, and although this package offers many more features than do open-source products such as Phorum, the differences are not great enough in most cases to warrant spending the money, let alone learning a package that cannot be modified or improved.

Finally, you can roll your own forums package, as I did in this column several years ago. Writing a set of programs that implement Web-based forums isn't difficult, but the time and effort in developing and debugging would be spent better learning an existing package. But for anyone who has a bit of experience with the Web and databases, producing a Web forums package means creating only a few tables (for users, messages and threads) and then providing people with the ability to insert new records (postings) into each thread.

All of these systems are more than sufficient for a typical small- to medium-sized Web site. And even if your site grows large, with thousands of messages posted by hundreds of users, it will unlikely tax any of these systems. That's because these systems all use relational databases to store messages, and even the smallest and simplest modern database server can handle thousands of transactions per day.

In an increasing number of cases, however, a simple Web-based forum is not enough. Although people might be willing to look through Web-based forums, they probably are not going to return to them day after day to keep track of the discussion. Whereas e-mail is a push medium in which the information is sent to you, forums are a pull medium, where new messages wait for you to request them.

Push and pull is not a new issue; those of us who remember the pre-Web Internet know that discussions used to be divided between mailing lists and Usenet newsgroups. The solution was to create various mail-to-Usenet gateways, many of which continue to be used today. Indeed, you can keep track of the latest bugs in GNU Emacs by subscribing to the bug-gnu-emacs mailing list or by reading the gnu.emacs.bug newsgroup on Usenet. The two are equivalent, facilitated by gateway software that transmits messages from one system to the other without unnecessary duplicates.

Do any such systems exist for Web-based forums? The answer is a tentative yes. After all, it's easy for a Web/database system to send mail to a list of e-mail addresses when a new message is posted. It is not much more difficult to create a sophisticated system of notifications, as provided in the OpenACS forums package, that allows users to subscribe to a particular forum or thread within a forum—and then choose to receive updates immediately or on a daily or monthly basis. In other words, the forum software is capable of creating an e-mail digest, in the same manner that list software can.

Posting Mail

Things get a bit trickier, however, when you want to let forum participants use their own e-mail software to post new messages to the list. In particular, several issues must be addressed:

  • Security and permissions—should anyone be allowed to post or reply to the forum, or only members? If access to the forum is somehow restricted, it is necessary to keep track of the allowed e-mail addresses. Of course, the fact that it's so easy to forge e-mail headers means it's impossible to verify completely that a posting truly is coming from a subscriber rather than from a worm or virus posing as a subscriber.

  • MIME—most popular e-mail programs, especially Microsoft Outlook, send mail by default with one or more attachments. Forum software must be intelligent enough to handle postings sent in this format, stripping out the HTML and attachments.

  • Size—if the mail-to-forum gateway is not intelligent enough to weed out extremely large postings, someone could mount a denial-of-service attack against your Web site by submitting extremely large postings. The software needs to be smart enough to filter this mail out, allowing the site administrator to limit the size of user postings.

  • Threading—most forum software keeps related postings together, either by subject title or by keeping track of which posting was a reply to another posting. It's difficult but somewhat possible to keep track of this when combining the Web and e-mail, because each medium uses a different mechanism for keeping track of such things.

I have seen a number of different ways to handle these and other issues. To date, I haven't seen any to my complete liking.

Phorum provides a script called phorummail that is designed to be invoked from the command line, presumably from a .forward or .qmail file or from a file containing mail alias definitions. Basically, the system administrator creates an alias (for example, apartments-forum) on the system and sets the .forward file to point to phorummail, passing the mandatory FORUM_ID parameter and the optional PATH_TO_FORUM parameter. Once you have done that, anyone sending mail to apartments-forum@your.site can post to the forum. Obviously, permissions have to be set appropriately in order for phorummail to work.

The problem is there isn't much security on the posting; although, if the forum is moderated, postings that originated by mail are marked as unapproved until a moderator reviews them. Threading is taken care of in a number of clever ways, but there doesn't seem to be any provision for handling MIME or mail-bombing attacks. In other words, Phorum handles mail-to-forums as you might expect but without fancy features.

The OpenACS forum software includes a more sophisticated system based on qmail, in which every outgoing notification message is sent from a unique identifier. This means a reply to an e-mailed message is submitted to the forum in question. The fact that OpenACS uses e-mail addresses as login names adds a tiny bit of security, but the fact remains that it's difficult to stop people from forging messages.

Mailing Lists and Databases

The best solution I can offer is to turn things on their head by making a Web-based forum an offshoot of an existing e-mail list. ezmlm, the mailing list package written by qmail author Dan Bernstein, has a set of extensions known as ezmlm-idx that, among other things, allows the subscriber list to be stored in either MySQL or PostgreSQL. When configuring the list, the administrator also configures some database tables and then points ezmlm to those tables.

This means that a good Web developer can create an e-mail list and then mirror that list to the Web. Anything coming from the Web appears to come from the currently logged-in user, who presumably had to log in to the Web-based forum application. Anything coming from an actual user goes through the same checks that ezmlm normally applies.

Mailman, a mailing list program that works with all MTAs (including qmail, Sendmail, Postfix and Exim) and that is undergoing continuous and impressive development, doesn't yet seem to have any hooks for storing its users in a relational database. It does store them in relatively safe, easy-to-use Berkeley DB files, though, which make it easy for a Web-based forum package to read them from there.

Several problems arise from handling forums as if they were e-mail lists, beginning with the issues of threading, which, as mentioned above, are different in e-mail and the Web. Add to this the fact that many forums allow users to add highlighting and attachments and even edit their own postings, and it quickly becomes obvious that the marriage is going to be difficult—an impedance mismatch, if you will, between the two media.

Nevertheless, the functionality is useful enough for a large number of people who are willing to ignore the fringe issues. Instead, they focus on increasing the number of participants in their forums and in giving users a choice regarding how they participate in these forums.

qpsmtpd

Like many of you, I am plagued by a torrent of spam on a daily basis. SpamAssassin, the open-source tool for analyzing and categorizing incoming e-mail, has proven to be an excellent ally in my fight against spam. And if you ever have run an e-mail list, you undoubtedly have discovered that e-mail worms don't discriminate; they post to lists as easily as they send mail to individuals.

Although system administration and SMTP servers are a bit off-topic for this column, I feel compelled to sing the praises of qpsmtpd, the open-source SMTP server developed by Ask Bjoern Hansen. qpsmtpd originally was designed for use with qmail, but now it apparently is able to work with other MTAs, including Sendmail and Postfix.

Why would you want to insert qpsmtpd instead of the default qmail-smtpd? If you're a Perl hacker, the reason to switch is qpsmtpd is written in Perl. But if you're less of a language bigot, you still can find a lot to love. That's because qpsmtpd divides SMTP's mail-sending routines into a number of stages and allows you to add your own hooks and functionality to each of these stages.

I downloaded qpsmtpd from its home at www.develooper.com (yes, that's two O characters in a row), followed the installation instructions and was up and running in about 20 minutes. Remember that qpsmtpd is a full-fledged SMTP server, meaning it refuses to run if you have another SMTP server listening on port 25. If you have been using daemontools to ensure the SMTP server starts at boot time and stays up following that, you should double-check that no link exists from /service to the old SMTP server. Otherwise, you might end up with two competing SMTP dæmons when your machine is next restarted.

The key to qpsmtpd is its plugins, which live in the plugins subdirectory. You can add or remove plugins by modifying the config/plugins file, with one plugin listed per line. For example, a portion of my config/plugins file looks like this:

# quit_fortune

check_earlytalker
count_unrecognized_commands 4

require_resolvable_fromhost

In other words, I commented out the quit_fortune plugin but have activated the check_earlytalker, count_unrecognized_commands and require_resolvable_fromhost plugins. count_unrecognized_commands takes a single numeric argument, which we have provided here.

To see these plugins or to add your own, go into the plugins directory itself. Each plugin consists of a register subroutine that attaches the plugin to one of qpsmtpd's various hooks and another subroutine that is invoked by the hook. For example, the require_resolvable_fromhost plugin begins with the following:

use Net::DNS qw(mx);

sub register {
  my ($self, $qp) = @_;
  $self->register_hook("mail", "mail_handler");
}

In other words, the register subroutine tells qpsmtpd that whenever the SMTP client invokes the mail command, the mail_handler subroutine also should be invoked. That subroutine does the following:


sub mail_handler {
  my ($self, $transaction, $sender) = @_;

  $sender->format ne "<>"
    and $self->qp->config
       ("require_resolvable_fromhost")
    and !check_dns($sender->host)
    and return (DENYSOFT,
       ($sender->host
        ? "Could not resolve ". $sender->host
        : "FQDN required in the envelope sender"));
   return DECLINED;
}

If you have done any Web development in mod_perl, this should look somewhat familiar to you. mail_handler can return DECLINED, which indicates that everything is fine and the mail can go through. Or, it can return DENYSOFT, which allows the sender to try again later. This happens because we don't want to start rejecting mail when a DNS server goes down; we're interested in punishing only spammers and others who shouldn't be sending mail directly. We also can return DENY, which rejects the mail outright.

I was able to write a new, working plugin within a few hours of downloading qpsmtpd, despite the lack of good documentation, and I'm sure that many other readers will have similar experiences. The fact that qpsmtpd is written in Perl means you have fast, easy access to everything that a usual Perl program would, as well as any CPAN modules that could make development easier.

You can attach plugins to a number of different hooks, including helo, ehlo, connect and even rcpt—each of which can perform tests of various sorts. There's even a SpamAssassin plugin for qpsmtpd, which invokes the famous spam-checking software before the message arrives in your mailbox.

I have been using qpsmtpd for about a month, and the amount of spam in my mailbox has declined rather impressively, even from the low amount that SpamAssassin was letting through. If you run your own machine, I strongly encourage you to look at qpsmtpd. It is an excellent example of how to write software to take arbitrary plugins, and as a bonus, you will receive only the mail that you should receive.

Conclusion

E-mail is a vital part of the Internet, as anyone reading this column undoubtedly knows. But as the Internet continues to expand, e-mail is being pushed in a number of different directions. This month, we looked at Web-based forums and at some of the ways in which we can connect them to e-mail lists. We also took a brief look at qpsmtpd, an alternative SMTP server designed to be highly extensible and configurable. Next month, I hope to return to Bricolage and other open-source content management systems, with an emphasis on how to incorporate a CMS into your existing Web sites.

Reuven M. Lerner, a longtime consultant in Web/database programming, is now a graduate student in Learning Sciences at Northwestern University in Chicago. As he writes this in September, he still is hoping that winters in Chicago won't be much different from what they were in Israel.