Missing CGI.pm and Other Mysteries

Reuven M. Lerner

Issue #37, May 1997

CGI.pm, for all of its useful and amazing features, is just one of the many terrific Perl 5 modules that isn't included with the standard Perl distribution.

I have reached the point in my career as a columnist when there is too much mail to ignore. I'm not drowning in torrents of e-mail, mind you, but it's still a nice surprise to receive responses from readers. Some of the mail that I have received over the last month or two, though, warrants extended response. In addition to answering some brief questions about CGI.pm and the guestbook program that appeared in the January issue, I will discuss security issues relevant to CGI programmers and Web administrators.

Where Is CGI.pm?

One of the first questions that I received—from several readers of my column in the January issue—is, “Where is CGI.pm?” These readers were surprised that the programs included with my column and which were supposed to work, were failing on them. In particular, they were getting messages like this:

   Can't locate CGI.pm in @INC at - line 1.
   BEGIN failed—compilation aborted at - line 1.

What was going wrong here? Why wasn't CGI.pm on their systems?

The simple answer to this question is that CGI.pm, for all of its useful and amazing features, is just one of the many terrific Perl 5 modules that isn't included with the standard Perl distribution. Perl comes with a number of basic modules, but these are only the tip of the iceberg. Most of the modules you might want to use are available from CPAN, the Comprehensive Perl Archive Network—such as those for database server access (obviating the need for separate Perl executables, such as oraperl and sybperl), manipulation of times and dates, handling of e-mail folders, and many more.

CPAN is a set of FTP sites that mirror each other at regular intervals, and which allow programmers to download the most recent versions of various modules programmers have generously donated to the Perl community. One of these modules, and one which I tend to use often in my professional life and in my LJ column, is CGI.pm—which, as you might have guessed, is a module that makes it relatively easy for us to write CGI programs.

The easiest way to get to CPAN is via the reflector at perl.com, a site run and maintained by Tom Christiansen, one of the luminaries of the Perl community. If you go to http://www.perl.com/CPAN, making sure you leave off the final slash, you will be able to select a site near you from which you can download various Perl modules. Alternatively, you can include the final slash, as well as the rest of the path name relative to CPAN, and enter a random site from the full CPAN network, as follows:

   http://www.perl.com/CPAN/modules/by-module/

This URL will result in a listing of the various module categories available for downloading. Each category contains one or more modules; for CGI.pm, we need to enter the CGI category, where we can find (as of this writing) the file CGI.pm-2.30.tar.gz.

After downloading this file, use the gunzip program to uncompress the file, and then the tar program to expand it. Do this with these commands:

   gunzip --verbose CGI.pm-2.30.tar.gz
   tar -xvvf CGI.pm-2.30.tar.gz

The doubled v option specifies additional “verbosity” when expanding files; while you can certainly untar CGI.pm without using any verbosity, I prefer to see what I'm expanding, rather than simply let the command go about its business.

If you are using a system with GNU tar (as is the case with virtually all Linux systems), you can combine these two operations by using the z option with tar:

   tar -zxvvf CGI.pm-2.30.tar.gz

After unpacking CGI.pm in this way, you should be able to enter the newly-created directory (named CGI.pm-2.30 in the above example), configure, and compile the file with the standard Perl module installation commands. Here is what that process looked like on my system:

   [1008] /downloads% cd CGI.pm-2.30
   [1009] /downloads/CGI.pm-2.30% perl Makefile.PL
   Checking if your kit is complete...
   Looks good
   Writing Makefile for CGI
   [1010] /downloads/CGI.pm-2.30% make
   cp CGI/Carp.pm ./blib/lib/CGI/Carp.pm
   cp CGI/Fast.pm ./blib/lib/CGI/Fast.pm
   cp CGI/Push.pm ./blib/lib/CGI/Push.pm
   cp CGI.pm ./blib/lib/CGI.pm
   Magnifying ./blib/man3/CGI::Fast.3
   Magnifying ./blib/man3/CGI::Carp.3
   Magnifying ./blib/man3/CGI::Push.3
   Magnifying ./blib/man3/CGI.3
Now that you have configured and compiled CGI.pm, install it into your system with the command make install. In order to do this, you will need to be logged in as the root user, as shown here:
   [1001] /downloads/CGI.pm-2.30# make install
   Skipping /usr/lib/perl5/site_perl/./CGI/Carp.pm (unchanged)
   Skipping /usr/lib/perl5/site_perl/./CGI/Fast.pm (unchanged)
   Skipping /usr/lib/perl5/site_perl/./CGI/Push.pm (unchanged)
   Installing /usr/lib/perl5/site_perl/./CGI.pm
   Skipping /usr/lib/perl5/man/man3/./CGI::Fast.3 (unchanged)
   Skipping /usr/lib/perl5/man/man3/./CGI::Carp.3 (unchanged)
   Skipping /usr/lib/perl5/man/man3/./CGI::Push.3 (unchanged)
   Installing /usr/lib/perl5/man/man3/./CGI.3
   Writing /usr/lib/perl5/site_perl/i586-linux/auto/CGI/.packlist
   Appending installation info to /usr/lib/perl5/i386-linux/5.003/perllocal.pod
That's it. Now, @INC (the Perl variable that knows where to look for Perl modules) will include CGI.pm, and you will no longer get those nasty error messages complaining that Perl cannot find the file.

Note that Red Hat Linux users might want to use the RPM (Red Hat Package Manager) version of CGI.pm (and other Perl modules) rather than the standard distribution. The advantage of doing this is that installation updates the RPM database and keeps track of the files on your system in an elegant way. The disadvantage is that it often takes a few days or weeks for the latest and greatest version of CGI.pm to appear on the Red Hat servers—and other, less popular modules are sometimes completely unavailable as RPMs. You can find various RPMs at the Red Hat site (and its mirrors), at ftp.redhat.com.

Guestbook Problems

I also received several notes from readers alerting us to two mistakes in the guestbook program in the January issue. Guestbooks, as we all know, generally contain more than one greeting from a user on a site. Thus, if we open the file using the Perl command:

   open (FILE, ">$filename") || &error_opening_file($filename);

we are asking for trouble, since the single > operator not only opens the file for writing, but destroys any information the file might have contained previously. The code should really have read:

   open (FILE, ">>$filename") || &error_opening_file($filename);
which means that we want to open the file named in $filename for writing, appending our new data to whatever might have been there before. Note that the >> operator creates a file if none existed before, so you should feel free to use >> for file creation and appending.

The other problem in that program, which was noticed by reader Bill Holloway, had to do with this section of code:

   @names = $query->param;
   # Iterate through each element from the form,
   # writing each element to $filename. Separate
   # elements with $separation_character defined
   # above.
   foreach $index (0 .. $#fields)
   {
      # Get the input from the appropriate HTML
      # form element
      $input = $query->param($fields[$index]);
      # Remove any instances of
      # $separation_character
      $input =~ s/$separation_character//g;
      # Now add the input to the file
      print FILE $input;
      # Don't print the separation character after
      # the final element
      print FILE $separation_character
         if ($index
   }

Of course, since we have imported the HTML form elements into the @names array, we have to read them out of @names, and not out of @fields, which is what the above code does. Thus the line:

   $input = $query->param($fields[$index]);
should be replaced with:
   $input = $query->param($names[$index]);
as you can see in the corrected version of the program, which appears in Listing 1.

Individual Users and CGI Directories

Another reader, Maro Shim (writing from Korea), noticed something concerning what I said in the February issue about having to add a ScriptAlias or Exec directive to the HTTP server's configuration file each time a new CGI directory needed to be added. Maro points out that this means an administrator has to modify the files for each individual user.

Let's get into the pros and cons of letting individual users have their own CGI directories, using Apache as an example. Then we'll discuss why this might not be the best thing to do. Finally, we'll discuss giving each user CGI access, but not giving them the run of the system.

Maro's suggestion is that administrators can create a symbolic link inside the cgi-bin directory (which is /home/httpd/cgi-bin by default for the copy of Apache running on my Red Hat Linux box), and that this link can point to a directory inside each user's public_html directory, which typically contains the user's HTML files.

For example, here is a listing of my personal home directory at the time of this writing:

   [1068] ~% ls  -F
   800omni.pdf    News/          public_html/
   Consulting/    Text/          response1.txt
   Development/   cgicyrcode.pl  test.dgs
   Mail/          chap4de.doc

Because I have used the -F option to ls, directory names end with slashes, which makes them easier to identify. You can also identify directories by color or boldface text if you use the --color option, but I'm too old-fashioned for that. The public_html directory is where my personal HTML files reside, which are available via a URL ending with ~reuven/, since my username is reuven, and the web server is configured to look in a user's ppublic_html directory. Thus, if there were a file index.html, it would be accessible via the URL:

   http://localhost/~reuven/index.html
(substituting an appropriate hostname for localhost, of course).

Personal HTML files are nice, and greatly reduce the amount of work that a system administrator must do in order to run a web server on which dozens, or perhaps hundreds, of users might want to put their own home pages. But what about CGI programs? That's where Maro's letter comes in: Inside the public_html directory we can create a subdirectory named cgi-bin, as follows:

   [1071] ~% cd public_html/
   [1072] ~/public_html% mkdir cgi-bin
   [1073] ~/public_html% ls -F
   cgi-bin/   test.html

Now the personal HTML directory contains two items—a file, test.html, which (in this case) can access ~reuven/test.html, and a directory named cgi-bin, the contents of which I can access as ~reuven/cgi-bin/. Remember, there isn't any magic in the name cgi-bin—at this point, it acts just like any other subdirectory. Indeed, if I were to place the CGI program elephant.pl inside ~reuven/public_html/cgi-bin, I could access it by going to:

   http://localhost/~reuven/cgi-bin/elephant.pl
But rather than seeing the results of executing elephant.pl, we will see its source code. This is true because we haven't told our server that it should execute the program; we need to explicitly install ~reuven/cgi-bin as a CGI directory. This is the most common way to create personal CGI directories. By including (under Apache) a ScriptAlias directive in the file srm.conf, we can create new CGI directories for each user on a system. Thus, if we were interested in turning ~reuven/cgi-bin into a CGI directory, we could use the line:
   ScriptAlias /~reuven/cgi-bin/ \
   /home/reuven/public_html/cgi-bin
which would have the desired effect. However, this means that every time we wish to give a user a CGI directory, we need to modify srm.conf and restart our HTTP server.

Maro's alternative saves us this work by taking a different approach: Rather than add new ScriptAlias directives to srm.conf, we simply tell our HTTP server that it should follow symbolic links within the CGI directories that already exist, using the commands:

   <Directory /home/httpd/cgi-bin>
   AllowOverride None
   Options FollowSymlinks
   </Directory>

Once we have done that, we can create symlinks to any directories that we want to turn into CGI directories. For example, to turn /home/reuven/public_html/cgi-bin/ into a CGI directory, we (as root, or another user with appropriate permissions) would only have to create the symbolic link:

   ln -s /home/httpd/cgi-bin/reuven \
   /home/reuven/public_html/cgi-bin
which would then let us use:
   http://localhost/cgi-bin/reuven/elephant.pl
which physically exists in my own personal directory, but which logically exists (as far as the HTTP server is concerned) in the /cgi-bin directory, which forces the server to execute it.

Before you turn on CGI directories for individual users, consider the ramifications: CGI programs are potentially an opening from the outside world into your server. If even one CGI program is written with malice aforethought, an attacker could gain access to your system—gathering information about your users, for example, or using that information to alter or damage files. It might seem convenient to give all users access to CGI programs, and it will certainly save you time in the short run, but the security implications are too serious to ignore.

If you cannot restrict CGI to a small subset of the users on your system, then you should consider installing a CGI wrapper program that performs safety checks before executing these programs. A CGI wrapper is a program which takes a CGI program as its argument. After the wrapper performs several security checks, it executes the CGI program—under the owner's ID, rather than the ID normally reserved for web programs. This prevents one CGI program from reading or changing another program's data—an increasingly possible problem as large numbers of unrelated sites are hosted on the same system.

One such wrapper, known as suEXEC, comes with Apache 1.2. Configuration and compilation of this program is relatively easy and is described in detail in the Apache documentation. Simply put, you compile suEXEC and set it to be SUID root, so it can change to the user ID of the user, regardless of who that owner might be. Finally, you will have to install the suexec program outside the normal CGI directory in a location defined in the httpd.h file in the Apache source code.

Another popular CGI wrapper is CGIwrap, which works in a similar way without being tied to a particular HTTP server. You can read more about CGIwrap at:

   http://www.umr.edu/~cgiwrap/

It is a good idea for these wrappers to run CGI programs under a user ID other than your HTTP server's default, letting individual users write and install various programs of their choosing, the possibility of sending programs data that can overflow buffers, or that might pass malicious arguments to programs using the Unix shell is too great to ignore, particularly with the security holes for which Unix is famous. You might want to insist that any CGI program on your server written in Perl use the -T argument, which turns on Perl's taint system that prevents user data from being passed to the shell without going through some sort of filter—but of course, such checks can be ignored, and not all CGI programs are written in Perl.

In short, there isn't any perfect solution, which means that at some point you will have to decide whether to make your system safer (but with angry users), or more exposed to possible damage (but with users satisfied with their ability to run CGI programs of their choosing).

Permissions for CGI Programs

While we're on the subject of security, this is probably a good time for me to publicly wipe away some of the egg that remains on my face in the wake of my February column, in which I suggested that you should install CGI programs with permissions of 777, known to non-numeric types as “a+rwx”, or permission for all users on the system to read, write, and execute the program.

Suffice it to say that this is a grave error, as several readers noticed. Computer security depends on plugging as many holes as possible. On networked multiuser systems running programs that come from various sources, it's almost certainly a bad idea to install a program having permissions that let anyone on the system modify the contents of that program, particularly when a simple (and probably hard-to-notice) modification or two can turn a seemingly innocuous program into a ravenous bug-blatter beast. On a system not running one of the wrappers mentioned here, all CGI programs are run with the same permission, meaning that someone could write a program that can mess with the code or data of another.

If you are the only programmer working on a particular CGI program or Web site, then you can install your programs with 755 permission (u=rwx,ga+rx), so that others on the system—including the HTTP server, which is generally responsible for running CGI programs—can read and execute your code but cannot modify it.

If you are working with others on a site or CGI program, you can set the permissions to 775 (ug=rwx,a+rx), which lets everyone read and execute the program, but allows only the owner and members of the file's group to edit it.

There are probably times when it is appropriate to install a CGI program with 777 (a+rwx) permission, but these are rare.

That's it for the mailbag for this time. Next month, we'll return to a discussion of how to make life easier for non-programmers who might want to modify entries in tables on disk, by writing a few small CGI programs which can read and write files efficiently and easily.

Reuven M. Lerner has been playing with the Web since early 1993, when it seemed like more like a fun toy than the World's Next Great Medium. He currently works as a independent Internet and Web consultant from his apartment in Haifa, Israel. When not working on the Web or volunteering in informal educational programs, he enjoys reading on just about any subject, but particularly politics and philosophy, cooking, solving crossword puzzles and hiking. You can reach him at reuven@the-tech.mit.edu or reuven@netvision.net.il.