Work the Shell

Web Server Tricks with $RANDOM

Dave Taylor

Issue #186, October 2009

Add some pseudo-randomness to your scripts and user interaction.

I just migrated onto a newer, bigger server (read that as “more expensive”, of course, but because my traffic's justifying it, I'm good with the change). To make matters more interesting, I also just bought a new laptop (a MacBook Pro), and between the two migrations, I've been looking through a lot of old directories and bumping into all sorts of scripts I've written in the past few years.

The one I thought would be interesting to explore here is one I wrote for a pal who was involved in a charity and wanted a way to have a single URL bounce people 50/50 to one of two different Web pages—a sort of mini-load balancer, though his application wasn't quite the same.

The core piece of this is the $RANDOM shell variable that's actually kind of magical—each time you reference it, you'll find it's different, even though you aren't actually assigning a new value to it. For example:

$ echo $RANDOM
21960
$ echo $RANDOM
19045
$ echo $RANDOM
2368
$ echo $RANDOM
2425
$ echo $RANDOM
10629

This violates the core user design principles of the shell and even the very definition of variables (which are supposed to be predictable—if you assign the value 37 to it, it still should have that value 200 lines and 17 references later). Other variables change value based on what you're doing, without you actually assigning it a new value, like $PWD, but because that's the present working directory, if you move around in the filesystem, it's logical that its value would change too.

The RANDOM value, however, is in a category of its own and makes it super easy to add some pseudo-randomness to your scripts and user interaction (whether it's truly random is a far more complicated—mind-numbingly complex—issue. If you're interested, try Googling “determining the randomness of random numbers” to jump down that particular rabbit hole.

In the Bourne Again Shell (bash), RANDOM numbers are within the range of 0..MAXINT (32,767). To chop it down and make it useful, you can simply divide it by the max numeric value you seek.

In other words, if you want a random number between 1..10, for example, use the % “remainder” function with a call to expr:

$ expr $RANDOM % 10
7
$ expr $RANDOM % 10
5
$ expr $RANDOM % 10
9
$ expr $RANDOM % 10
6
$ expr $RANDOM % 10
8

Boiling this down further, how to choose between two options randomly now should be jumping out of the page at you, dear reader:

if [ "$(expr $RANDOM % 2 )" -eq "0" ] ; then
      conditional expression
fi

If you wanted to be a purist, you also could write this with the $(( )) math notation, of course, as you'll see a bit later in this column.

That's enough for us to write the shell script I mentioned earlier, the one that randomly switched between two possible pages when invoked:

#!/usr/local/bin/bash
url1="http://www.bing.com/"
url2="http://www.google.com/"
if [ "$(expr $RANDOM % 2 )" -eq "0" ] ; then
  echo "Location: $url1"; echo ""
else
  echo "Location: $url2"; echo ""
fi
exit 0

Can you see what this example script does? If you guessed “randomly redirects you to either Google or Bing”, you're right! If not, well, what the heck? Go back and read the code again!

Now, let's say my friend said “75% of the time, I really want to take people to URL1. Can you do it, Dave?”

Here's how that might look:

if [ "$(expr $RANDOM % 100 )" -lt "75" ] ; then

(Or, even more clearly as % 4 -lt 3, for that matter.)

If you have more than two choices, you can use a case statement that makes uneven allocation a bit tricky but otherwise is straightforward:

case $(( $RANDOM % 4 )) in
  0 ) echo $url1;               ;;
  1 ) echo $url2;               ;;
  2 ) echo $url3;               ;;
  3 ) echo $url4;               ;;
esac

Load Balancing with ruptime

With this in mind, we could write an n-way load-balancing script, so that when people come to the home page, they automatically would be bounced to one of the n possible servers.

The interesting step actually would be to round-robin them, based on the server load, of course, which could be done by stepping through the data using the ruptime command.

So, given the uptime output of:

$ ruptime host1
host1   16:51  up 3+53:17, 3 users, load 0.65 0.68 0.51

What we really want is to get a list of hostnames sorted by how busy those systems are, which can be generated by ruptime with the -rl flags, as shown here:

$ ruptime -r -l
host1   down   16+08:34
host4   up     10+13:26,   7 users,  load 0.07, 0.39, 1.04
host3   up     14+06:49,   3 users,  load 0.10, 0.38, 0.49
host2   up      1+17:40,   4 users,  load 0.18, 0.13, 0.09

As you can see, the first step is to screen out the hosts that aren't actually up at the present moment, then grab the first field (as it's sorted by how busy the system is at the current moment).

One approach to this could be to call ruptime every time a request comes in and just grab the first value. This can be done like so:

$ ruptime -rl | grep -v down | head -1 | cut -d\  -f1
host2

The trouble is that the systems report uptime information only approximately every minute, and calling ruptime dozens or hundreds of times per second can end up producing a problem—the least-busy system will be swamped. If you get a lot of traffic, that's not going to be a manageable solution.

Here's where we could have our friend $RANDOM step back into the picture. Instead of always simply picking the machine with the lowest load average, let's randomly choose one of the three least-busy systems. The core snippet would look like this:

getline="$(( ( $RANDOM % 3 ) + 1 ))"
targethost="$(ruptime -rl | grep -v down |

   sed -n ${getline}p | cut -d\  -f1)"

With a bit more code, you could bias it so that, say, 50% of the time it would pick the least-busy system, 33% of the time it would pick the second-least-busy system, and 17% of the time it would pick the third-least-busy system. As time passed and as the load moved around, these systems would keep changing, and you'd achieve a crude but effective load-balancing system.

Knowing how easily you can select one of a number of possible paths randomly in a shell script, what else can you imagine that would be helpful or just fun?