Set and forget policy management with Squid Proxy server.
Large enterprises and nuclear laboratories aren't the only organizations that need an Internet access policy and a means of enforcing it. My household has an Internet access policy, and the technique I've used to enforce it is applicable to almost any organization. In our case, I'm not too concerned about outside security threats. Our network is is behind a NAT router, and our Wi-Fi has a ridiculously ugly password. Our workstations are either Linux or properly patched Windows machines (if there is such a thing). No, our concerns come from inside our network: our kids like to play Web-based games, and that often gets in the way of chores and homework.
We're also concerned they might stumble upon Web content that we'd rather they not access. So no, we're not protecting nuclear secrets or intellectual property, but we are enabling the household to run smoothly without undue distractions.
In general, my wife and I don't care if our kids play games on-line or stream media. But, if their homework or chores don't get completed, we want a means of “grounding” them from this content. The problem is that we also home school, and much of their educational content is also on-line. So, we can't simply block their access. We need something a bit more flexible.
When I set out to solve this problem, I made a list of the goals I wanted to accomplish:
I don't want managing my kid's Internet access to become a full-time job. I want to be able to set a policy and have it implemented.
My wife doesn't want to know how to log in, modify a configuration file and restart a proxy dæmon. She needs to be able to point her browser, check a few boxes and get on with her life.
I don't want to write too much code. I'm willing to write a little bit of code, but I'm not interested in re-inventing the wheel if it already exists.
I want to be able to enforce almost any policy that makes sense for our household.
I don't want anything I do to break their Internet access when they take their laptops outside the house.
I'm sure my household isn't the only organization interested in these results. However, I made an assumption that may not make sense in other organizations: my kids won't be taking any sophisticated measures to circumvent our policy. However, I do reserve the right to participate in the arms race if they do.
For the purpose of this article, anytime this assumption leads to a configuration that may not make sense in more sophisticated environments, I'll try to discuss a few options that will allow you to strengthen your configuration.
I wasn't able to find any single software package that was flexible enough to do what I wanted and also easy enough to use, so that it wouldn't take considerable effort on the part of my wife and me to employ it. I was able to see that the Squid proxy server had the potential of doing what I wanted with just a little bit of coding on my part. My code will tell the proxy server how to handle each request as it comes in. The proxy either will complete the request for the user or send the user a Web page indicating that the site the user is trying to access has been blocked. This is how the proxy will implement whatever policy we choose.
I've decided that I want to be able to give my family members one of four levels of Internet access. At the two extremes, family members with “open” access can go just about anywhere they want, whereas family members with “blocked” access can't go anywhere on the Internet. My wife and I will have open access, for example. If one of the boys is grounded from the Internet, we'll simply set him as blocked.
However, it might be nice to be able to allow our kids to go to only a predetermined list of sites, say for educational purposes. In this case, we need a “whitelist-only” access level. Finally, I'm planning on a “filtered” access level where we can be a bit more granular and block things like music download, Flash games and Java applets. This is the access level the boys generally will have. We then can say “no more games” and have the proxy enforce that policy.
Because I don't want to write an actual interface for all of this, I simply use phpMyAdmin to update a database and set policy (Figure 1). In order to grant a particular access level, I simply update the corresponding cell in the grid, with 1 being on, and 0 being off.
Policy enforcement also will require some client configuration, which I'll discuss in a moment. However, I'm also going to discuss using OpenDNS as a means of filtering out things that I'd rather not spend my time testing and filtering. This is a good example of a security-in-depth posture.
I've configured OpenDNS to filter out the content that I don't anticipate ever changing my mind about. I don't think there's any reason for my family to be able to access dating sites, gambling sites or porn sites (Figure 2). Although not perfect, the OpenDNS people do a pretty good job of filtering this content without me having to do any testing myself. When that kind of testing fails, it has the potential for some really awkward moments—I'd just assume pass.
Earlier in this article, I mentioned that this would require some client configuration. Most Web browsers allow you to configure them to use a proxy server to access the Internet. The naïve approach is simply to turn on proxy access by checking the check box. However, if my kids take their laptops to the library, where our proxy isn't available, they won't be able to access the Internet, and that violates goal number five. So, I've opted to use the automatic proxy configuration that most modern browsers support. This requires that I write a JavaScript function that determines how Web sites are to be accessed, either directly or via a proxy (Listing 1).
Every time your browser accesses a Web site, it calls the FindProxyForURL() function to see what method it should use to access the site: directly or via a proxy. The function shown in Listing 1 is just an example, but it demonstrates a few use cases that are worth mentioning. As you can see from line 15, you can return a semicolon-delimited list of methods to use. Your browser will try them in turn. In this case, if the proxy happens to be inaccessible, you will fall back to DIRECT access to the Web site in question. In a more strict environment, that may not be the correct policy.
On line 11, you can see that I'm ensuring that Web sites on our local network are accessed directly. On line 7, I'm demonstrating how to test for particular hostnames. There are a few Web sites that I access through a VPN tunnel on my workstation, so I cannot use the proxy. Finally, on line 3, you see something interesting. Here, I'm testing to see if a particular hostname is resolvable to an IP address. I've configured our LAN's DNS server to resolve that name, but no other DNS server would be able to. This way, when our kids take their laptops out of our network, their browser doesn't try to use our proxy. Sure, we simply could fail over to direct access like we did on line 15, but fail over takes time.
The automatic proxy configuration is something that a more sophisticated user could circumvent. There are add-ins for various browsers that would prevent the user from changing this configuration. However, that wouldn't prevent the user from installing a new browser or starting a new Firefox profile. The fool-proof method to enforce this policy is at the gateway router: simply set a firewall rule that prevents access to the Web coming from any IP address except the proxy. This even could be done for specific client-host combinations, if needed.
While you're adding firewall rules to your gateway router, you might be tempted to configure the router to forward all Web traffic through the proxy, forming what often is called a transparent proxy. However, according to RFC 3143, this isn't a recommended configuration, because it often breaks things like browser cache and history.
So now that I've discussed client, DNS and possible router configuration, it's time to look at the Squid proxy server configuration. The installation itself was pretty straightforward. I just used my distribution's package management system, so I won't discuss that here. The Squid proxy provides a lot of knobs that you can turn in order to optimize its cache and your Internet connection. Even though performance improvements are a nice ancillary benefit from implementing the proxy server, those configuration options are beyond the scope of this discussion. That leaves the single configuration change that is necessary in order plug my code into the system. All that was needed was to edit the /etc/squid/squid.conf file and add a single line:
redirect_program /etc/squid/redirector.pl
This one directive essentially tells the Squid proxy to “ask” my program how to handle every request that clients make. The program logic is pretty simple:
Listen on STDIN for requests.
Parse the request.
Make a decision based on policy.
Return the answer to the proxy.
Let's look at the sample code in Listing 2.
The main loop begins on line 11, where it reads from STDIN. Lines 11–24 mostly are concerned with parsing the request from the Squid proxy. Lines 25–28 are where the program queries the database to see what the particular client's permissions are. Lines 29–57 check to see what permissions were read in from the database and return appropriately. In the case where the client is allowed “filtered” access to the Internet, I have a skeleton of the logic that I have in mind. I didn't want to bog this article down with trivial code. It was more important to demonstrate the structure and general logic of a Squid proxy redirector than it was to supply complete code. But you can see that I could implement just about any conceivable access policy in just a few lines of code and regular expressions.
The send_answer() function starting on line 67 really doesn't do much at this point, but in the future, I could add some logging capability here pretty easily.
The is_on_list() function starting on line 72 is perhaps a bit interesting. This function takes the hostname that the client is trying to access and breaks it up into a list of subdomains. Then it checks if those subdomains are listed in the database, whose name was passed in as a parameter. This way, I simply can put example.com in the database, and it will match example.com, www.example.com or webmail.example.com, for example.
By passing in different table names, I can use the same matching algorithm to match any number of different access control lists.
As you can see, the code really isn't very complex. But, by adding a bit more complexity, I should be able to enforce just about any access policy I can imagine. There is, however, one area that needs to be improved. As written, the program accesses the database several times for each access request that it handles. This is extremely inefficient, and by the time you read this, I probably will have implemented some sort of caching mechanism.
However, caching also will make the system less responsive either to changes to access policy or access control lists, as I will have to wait for the cached information to expire or restart the proxy dæmon.
In practice, I've seen something that is worth mentioning. Most Web browsers have their own caching mechanism. Because of this cache, if you change an access policy at the proxy, your clients aren't always aware of the change. In the case where you “open up” access, customers will need to refresh their cache in order to access previously blocked content. In the case where you restrict access, that content still may be available until the cache expires. One solution is to set the local cache size to 0 and simply rely upon the proxy server's cache.
Also, once the clients have been configured to talk to a proxy on the local network, it becomes possible to swap in different proxies or even to daisy-chain proxies without the client needing to do anything. This opens up the possibility of using Dan's Guardian, for example, to do content filtering in addition to access control.
By this time, many of you might think I'm some kind of uber-strict control freak. However, my family spends a lot of time on the Internet—sometimes to a fault. Most of the time, my family members use the Internet in an appropriate manner, but when they don't, my wife and I need a means of enforcing household rules without having to keep a constant watch over our kids.