Paranoid Penguin

Building a Secure Squid Web Proxy, Part IV

Mick Bauer

Issue #184, August 2009

Add squidGuard's blacklist functionality to your Squid proxy.

In my previous three columns [April, May and July 2009], I described the concept, benefits and architectural considerations of outbound Web proxies (Part I); discussed basic Squid installation, configuration and operation (Part II); and explained Squid Access Control Lists (ACLs), its ability to run as an unprivileged user and provided some pointers on running Squid in a chroot jail (Part III).

Although by no means exhaustively detailed, those articles nonetheless cover the bulk of Squid's built-in security functionality (ACLs, running nonroot and possibly running chrooted). This month, I conclude this series by covering an important Squid add-on: squidGuard.

squidGuard lets you selectively enforce “blacklists” of Internet domains and URLs you don't want end users to be able to reach. Typically, people use squidGuard with third-party blacklists from various free and commercial sites, so that's the usage scenario I describe in this article.

Introduction to squidGuard

Put simply, squidGuard is a domain and URL filter. It filters domains and URLs mostly by comparing them against lists (flat files), but also, optionally, by comparing them against regular expressions.

squidGuard does not filter the actual contents of Web sites. This is the domain of appliance-based commercial Web proxies such as Blue Coat, and even products like that tend to emphasize URL/domain filtering over actual content parsing due to the high-computing (performance) cost involved.

You may wonder, what have URL and domain filtering got to do with security? Isn't that actually a form of censorship and bandwidth-use policing? On the one hand, yes, to some extent, it is.

Early in my former career as a firewall engineer and administrator, I rankled at management's expectation that I maintain lists of the most popular URLs and domains visited. I didn't think it was my business what people used their computers for, but rather it should be the job of their immediate supervisors to know what their own employees were doing.

But the fact is, organizations have the right to manage their bandwidth and other computing resources as they see fit (provided they're honest with their members/employees about privacy expectations), and security professionals are frequently in the best “position” to know what's going on. Firewalls and Web proxies typically comprise the most convenient “choke points” for monitoring or filtering Web traffic.

Furthermore, the bigger domain/URL blacklists frequently include categories for malware, phishing and other Web site categories that do, in fact, have direct security ramifications. For example, the free Shalla's Blacklists include more than 27,600 known sources of spyware!

Even if you don't care about conserving bandwidth or enforcing acceptable-use policies, there's still value in configuring squidGuard to block access to “known dangerous” Web sites. That's precisely what I'm going to show you how to do.

Getting and Installing squidGuard

If you run a recent version of Fedora, SUSE, Debian or Ubuntu Linux, squidGuard is available as a binary package from your OS's usual software mirrors (in the case of Ubuntu, it's in the universe repositories). If you run RHEL or CentOS, however, you need to install either Dag Wieers' RPM of squidGuard version 1.2, Excalibur Partners' RPM of squidGuard version 1.4, or you'll have to compile squidGuard from the latest source code, available at the squidGuard home page (see Resources for the appropriate links).

Speaking of squidGuard versions, the latest stable version of squidGuard at the time of this writing is squidGuard 1.4. But, if your Linux distribution of choice provides only squidGuard 1.2, as is the case with Fedora 10 and Ubuntu 9.04, or as with OpenSUSE 11.1, which has squidGuard 1.3, don't worry. Your distribution almost certainly has back-ported any applicable squidGuard 1.4 security patches, and from a functionality standpoint, the most compelling feature in 1.4 absent in earlier versions is support for MySQL authentication.

Hoping you'll forgive my Ubuntu bias of late, the command to install squidGuard on Ubuntu systems is simply:

bash-$ sudo apt-get squidguard

As noted above, squidGuard is in the universe repository, so you'll either need to uncomment the universe lines in /etc/apt/sources.list, or open Ubuntu's Software Sources applet, and assuming it isn't already checked, check the box next to Community-maintained Open Source software (universe), which will uncomment those lines for you.

Besides using apt-get from a command prompt to install squidGuard, you could instead use the Synaptic package manager. Either of these three approaches automatically results in your system's downloading and installing a deb archive of squidGuard.

If you need a more-current version of squidGuard than what your distribution provides and are willing to take it upon yourself to keep it patched for emerging security bugs, the squidGuard home page has complete instructions.

Getting and Installing Blacklists

Once you've obtained and installed squidGuard, you need a set of blacklists. There's a decent list of links to these at squidguard.org/blacklists.html, and of these, I think you could do far worse than Shalla's Blacklists (see Resources), a free-for-noncommercial-use set that includes more than 1.6 million entries organized into 65 categories. It's also free for commercial use; you just have to register and promise to provide feedback and list updates. Shalla's Blacklists are the set I use for the configuration examples through the rest of this article.

Once you've got a blacklist archive, unpack it. It doesn't necessarily matter where, so long as the entire directory hierarchy is owned by the same user and group under which Squid runs (proxy:proxy on Ubuntu systems). A common default location for blacklists is /var/lib/squidguard/db.

To extract Shalla's Blacklists to that directory, I move the archive file there:

bash-$ cp mv shallalist.tar.gz /var/lib/squidguard/db

Then, I unpack it like this:

bash-$ sudo -s
bash-# cd /var/lib/squidguard/db
bash-# tar --strip 1 -xvzf shallalist.tar.gz
bash-# rm shallalist.tar.tz

In the above tar command, the --strip 2 option strips the leading BL/ from the paths of everything extracted from the shallalist tar archive. Without this option, there would be an additional directory (BL/) under /var/lib/squidguard/db containing the blacklist categories, for example, /var/lib/squidguard/db/BL/realestate rather than /var/lib/squidguard/db/realestate. Note that you definitely want to delete the shallalist.tar.gz file as shown above; otherwise, squidGuard will include it in the database into which the contents of /var/lib/squidguard/db will later be imported.

Note also that at this point you're still in a root shell; you need to stay there for just a few more commands. To set appropriate ownership and permissions for your blacklists, use these commands:

bash-# chown -R proxy:proxy /var/lib/squidguard/db/
bash-# find /var/lib/squidguard/db -type f | xargs chmod 644
bash-# find /var/lib/squidguard/db -type d | xargs chmod 755
bash-# exit

And with that, your blacklists are ready for squidGuard to start blocking. After, that is, you configure squidGuard to do so.

Configuring squidGuard

On Ubuntu and OpenSUSE systems (and probably others), squidGuard's configuration file squidGuard.conf is kept in /etc/squid/, and squidGuard automatically looks there when it starts. As root, use the text editor of your choice to open /etc/squid/squidGuard.conf. If using a command-line editor like vi on Ubuntu systems, don't forget to use sudo, as with practically everything else under /etc/, you need to have superuser privileges to change squidGuard.conf.

squidGuard.conf's basic structure is:

  1. Options (mostly paths)

  2. Time Rules

  3. Rewrite Rules

  4. Source Addresses

  5. Destination Classes

  6. Access Control Lists

In this article, my goal is quite modest: to help you get started with a simple blacklist that applies to all users, regardless of time of day, and without any fancier URL-rewriting than redirecting all blocked transactions to the same page. Accordingly, let's focus on examples of Options, Destination Classes and ACLs. Before you change squidGuard.conf, it's a good idea to make a backup copy, like this:

bash-$ sudo cp /etc/squid/squidGuard.conf
/etc/squid/squidGuard.conf.def

Now you can edit squidGuard.conf. First, at the top, leave what are probably the default values of dbhome and logdir, which specify the paths of squidGuard's databases of blacklists (or whitelists—you also can write ACLs that explicitly allow access to certain sites and domains) and its log files, respectively. These are the defaults in Ubuntu:

dbhome /var/lib/squidguard/db
logdir /var/log/squid

These paths are easy enough to understand, especially considering that you just extracted Shalla's Blacklists to /var/lib/squidguard/db. Whatever you do, do not leave a completely blank line at the very top of the file; doing so prevents squidGuard from starting properly.

Next, you need to create a Destination Class. This being a security column, let's focus on blocking access to sites in the spyware and remotecontrol categories. You certainly don't want your users' systems to become infected with spyware, and you probably don't want users to grant outsiders remote control of their systems either.

Destination Classes that describe these two categories from Shalla's Blacklists look like this:

dest remotecontrol {
        domainlist      remotecontrol/domains
        urllist         remotecontrol/urls
}

dest spyware {
        domainlist      spyware/domains
        urllist         spyware/urls
}

As you can see, the paths in each domainlist and urllist statement are relative to the top-level database path you specified with the dbhome option. Note also the curly bracket {} placement: left brackets always immediately follow the destination name, on the same line, and right brackets always occupy their own line at the end of the class definition.

Finally, you need an ACL that references these destinations—specifically, one that blocks access to them. The ACL syntax in squidGuard is actually quite flexible, and it's easy to write both “allow all except...” and “block all except...” ACLs. Like most ACL languages, they're parsed left to right, top to bottom. Once a given transaction matches any element in an ACL, it's either blocked or passed as specified, and not matched against subsequent elements or ACLs.

You can have multiple ACL definitions in your squidGuard.conf file, but in this example scenario, it will suffice to edit the default ACL. A simple default ACL that passes all traffic unless destined for sites in the remotecontrol or spyware blacklists would look like this:

acl {
        default {
                pass !remotecontrol !spyware all
                redirect http://www.google.com
        }
}

In this example, default is the name of the ACL. Your default squidGuard.conf file probably already has an ACL definition named default, so be sure either to edit that one or delete it before entering the above definition; you can't have two different ACLs both named default.

The pass statement says that things matching remotecontrol (as defined in the prior Destination Class of that name) do not get passed, nor does spyware, but all (a wild card that matches anything that makes it that far in the pass statement) does. In other words, if a given destination matches anything in the remotecontrol or spyware blacklists (either by domain or URL), it won't be passed, but rather will be redirected per the subsequent redirect statement, which points to the Google home page.

Just to make sure you understand how this works, let me point out that if the wild card all occurred before !remotecontrol, as in “pass all !remotecontrol !spyware”, squidGuard would not block anything, because matched transactions aren't compared against any elements that follow the element they matched. When constructing ACLs, remember that order matters!

I freely admit I'm being very lazy in specifying that as my redirect page. More professional system administrators would want to put a customized “You've been redirected here because...” message onto a Web server under their control and list that URL instead. Alternatively, squidGuard comes with a handy CGI script that displays pertinent transaction data back to the user. On Ubuntu systems, the script's full path is /usr/share/doc/squidguard/examples/squidGuard.cgi.gz.

This brings me to my only complaint about squidGuard: if you want to display a custom message to redirected clients, you either need to run Apache on your Squid server and specify an http://localhost/ URL, or specify a URL pointing to some other Web server you control. This is in contrast to Squid itself, which has no problem displaying its own custom messages to clients without requiring a dedicated HTTP dæmon (either local or external).

To review, your complete sample squidGuard.conf file (not counting any commented-out lines from the default file) should look like this:

dbhome /var/lib/squidguard/db
logdir /var/log/squid

dest remotecontrol {
        domainlist      remotecontrol/domains
        urllist         remotecontrol/urls
}

dest spyware {
        domainlist      spyware/domains
        urllist         spyware/urls
}

acl {
        default {
                pass !remotecontrol !spyware all
                redirect http://www.google.com
        }
}

Now that squidGuard is configured and, among other things, knows where to look for its databases, you need to create actual database files for the files and directories under /var/lib/squidGuard/db, using this command:

bash-$ sudo -u proxy squidGuard -C all

This imports all files specified in active Destination Class definitions in squidGuard.conf, specifically in this example, the files remotecontrol/domains, remotecontrol/urls, spyware/domains and spyware/urls, into Berkeley DB-format databases. Obviously, squidGuard can access the blacklists much faster using database files than by parsing flat text files.

Configuring Squid to Use squidGuard

The last thing you need to do is reconfigure Squid to use squidGuard as a redirector and tell it how many redirector processes to keep running. The location of your squidGuard binary is highly distribution-specific; to be sure, you can find it like this:

bash-$ which squidGuard
/usr/bin/squidGuard

As for the number of redirector processes, you want a good balance of system resource usage and squidGuard performance. Starting a lot of redirectors consumes resources but maximizes squidGuard performance, whereas starting only a couple conserves resources by sacrificing squidGuard performance. Ubuntu's default of 5 is a reasonable middle ground.

The squid.conf parameters for both of these settings (redirector location and number of processes) are different depending on with which version of Squid you're using squidGuard. For Squid versions 2.5 and earlier, they're redirect_program and redirect_children. For Squid versions 2.6 and later, they're url_rewrite_program and url_rewrite_program.

For example, on my Ubuntu 9.04 system, which runs Squid version 2.7, I used a text editor (run via sudo) to add the following two lines to /etc/squid/squid.conf:

url_rewrite_program /usr/bin/squidGuard
url_rewrite_children 5

As with any other time you edit /etc/squid/squid.conf, it's probably a good idea to add custom configuration lines before or after their corresponding comment blocks. squid.conf, you may recall, is essentially self-documented—it contains many lines of example settings and descriptions of them, all in the form of comments (lines beginning with #). Keeping your customizations near their corresponding examples/defaults/comments both minimizes the chance you'll define the same parameter in two different places, and, of course, it gives you easy access to information about the things you're changing.

By the way, I'm assuming Squid itself already is installed, configured and working the way you want it to (beyond blacklisting). If you haven't gotten that far before installing squidGuard, please refer to my previous three columns (see Resources).

Before those changes take effect, you need to restart Squid. On most Linux systems, you can use this command (omitting the sudo if you're already in a root shell):

bash-$ /etc/init.d/squid reload

If you get no error messages, and if when you do a ps -axuw |grep squid you see not only a couple Squid processes, but also five squidGuard processes, then congratulations! You've now got a working installation of squidGuard.

But is it actually doing what you want it to do? Given the filters we just put in place, the quickest way to tell is, on some client configured to use your Squid proxy, to point a browser to http://www.gotomypc.com (a site in the remotecontrol blacklist). If everything's working correctly, your browser will not pull up gotomypc, but rather Google. squidGuard is passive-aggressively encouraging you to surf to a safer site!

Conclusion

squidGuard isn't the only Squid add-on of interest to the security-conscious. squidtaild and squidview, for example, are two different programs for monitoring and creating reports from Squid logs (both of them are available in Ubuntu's universe repository). I leave it to you though to take your Squid server to the next level.

This concludes my introductory series on building a secure Web proxy with Squid. I hope you're off to a good, safe start!

Mick Bauer (darth.elmo@wiremonkeys.org) is Network Security Architect for one of the US's largest banks. He is the author of the O'Reilly book Linux Server Security, 2nd edition (formerly called Building Secure Servers With Linux), an occasional presenter at information security conferences and composer of the “Network Engineering Polka”.