Add some pseudo-randomness to your scripts and user interaction.
I just migrated onto a newer, bigger server (read that as “more expensive”, of course, but because my traffic's justifying it, I'm good with the change). To make matters more interesting, I also just bought a new laptop (a MacBook Pro), and between the two migrations, I've been looking through a lot of old directories and bumping into all sorts of scripts I've written in the past few years.
The one I thought would be interesting to explore here is one I wrote for a pal who was involved in a charity and wanted a way to have a single URL bounce people 50/50 to one of two different Web pages—a sort of mini-load balancer, though his application wasn't quite the same.
The core piece of this is the $RANDOM shell variable that's actually kind of magical—each time you reference it, you'll find it's different, even though you aren't actually assigning a new value to it. For example:
$ echo $RANDOM 21960 $ echo $RANDOM 19045 $ echo $RANDOM 2368 $ echo $RANDOM 2425 $ echo $RANDOM 10629
This violates the core user design principles of the shell and even the very definition of variables (which are supposed to be predictable—if you assign the value 37 to it, it still should have that value 200 lines and 17 references later). Other variables change value based on what you're doing, without you actually assigning it a new value, like $PWD, but because that's the present working directory, if you move around in the filesystem, it's logical that its value would change too.
The RANDOM value, however, is in a category of its own and makes it super easy to add some pseudo-randomness to your scripts and user interaction (whether it's truly random is a far more complicated—mind-numbingly complex—issue. If you're interested, try Googling “determining the randomness of random numbers” to jump down that particular rabbit hole.
In the Bourne Again Shell (bash), RANDOM numbers are within the range of 0..MAXINT (32,767). To chop it down and make it useful, you can simply divide it by the max numeric value you seek.
In other words, if you want a random number between 1..10, for example, use the % “remainder” function with a call to expr:
$ expr $RANDOM % 10 7 $ expr $RANDOM % 10 5 $ expr $RANDOM % 10 9 $ expr $RANDOM % 10 6 $ expr $RANDOM % 10 8
Boiling this down further, how to choose between two options randomly now should be jumping out of the page at you, dear reader:
if [ "$(expr $RANDOM % 2 )" -eq "0" ] ; then conditional expression fi
If you wanted to be a purist, you also could write this with the $(( )) math notation, of course, as you'll see a bit later in this column.
That's enough for us to write the shell script I mentioned earlier, the one that randomly switched between two possible pages when invoked:
#!/usr/local/bin/bash url1="http://www.bing.com/" url2="http://www.google.com/" if [ "$(expr $RANDOM % 2 )" -eq "0" ] ; then echo "Location: $url1"; echo "" else echo "Location: $url2"; echo "" fi exit 0
Can you see what this example script does? If you guessed “randomly redirects you to either Google or Bing”, you're right! If not, well, what the heck? Go back and read the code again!
Now, let's say my friend said “75% of the time, I really want to take people to URL1. Can you do it, Dave?”
Here's how that might look:
if [ "$(expr $RANDOM % 100 )" -lt "75" ] ; then
(Or, even more clearly as % 4 -lt 3, for that matter.)
If you have more than two choices, you can use a case statement that makes uneven allocation a bit tricky but otherwise is straightforward:
case $(( $RANDOM % 4 )) in 0 ) echo $url1; ;; 1 ) echo $url2; ;; 2 ) echo $url3; ;; 3 ) echo $url4; ;; esac
With this in mind, we could write an n-way load-balancing script, so that when people come to the home page, they automatically would be bounced to one of the n possible servers.
The interesting step actually would be to round-robin them, based on the server load, of course, which could be done by stepping through the data using the ruptime command.
So, given the uptime output of:
$ ruptime host1 host1 16:51 up 3+53:17, 3 users, load 0.65 0.68 0.51
What we really want is to get a list of hostnames sorted by how busy those systems are, which can be generated by ruptime with the -rl flags, as shown here:
$ ruptime -r -l host1 down 16+08:34 host4 up 10+13:26, 7 users, load 0.07, 0.39, 1.04 host3 up 14+06:49, 3 users, load 0.10, 0.38, 0.49 host2 up 1+17:40, 4 users, load 0.18, 0.13, 0.09
As you can see, the first step is to screen out the hosts that aren't actually up at the present moment, then grab the first field (as it's sorted by how busy the system is at the current moment).
One approach to this could be to call ruptime every time a request comes in and just grab the first value. This can be done like so:
$ ruptime -rl | grep -v down | head -1 | cut -d\ -f1 host2
The trouble is that the systems report uptime information only approximately every minute, and calling ruptime dozens or hundreds of times per second can end up producing a problem—the least-busy system will be swamped. If you get a lot of traffic, that's not going to be a manageable solution.
Here's where we could have our friend $RANDOM step back into the picture. Instead of always simply picking the machine with the lowest load average, let's randomly choose one of the three least-busy systems. The core snippet would look like this:
getline="$(( ( $RANDOM % 3 ) + 1 ))" targethost="$(ruptime -rl | grep -v down | sed -n ${getline}p | cut -d\ -f1)"
With a bit more code, you could bias it so that, say, 50% of the time it would pick the least-busy system, 33% of the time it would pick the second-least-busy system, and 17% of the time it would pick the third-least-busy system. As time passed and as the load moved around, these systems would keep changing, and you'd achieve a crude but effective load-balancing system.
Knowing how easily you can select one of a number of possible paths randomly in a shell script, what else can you imagine that would be helpful or just fun?