Work the Shell

Numerology, or the Number 23

Dave Taylor

Issue #165, January 2008

Use a shell script to do basic numerology.

I admit it, I watch a lot of movies. In the decades I've been alive (a gentleman doesn't disclose his age!), I've watched tens of thousands of movies, and average about, oh, 6–8 movies/week. Truth be told, I prefer classic movies from the '40s and '50s, but my tastes range all over the map from cheesy horror films to the latest avant-garde foreign cinema.

When I realized that the deadline for this column was rushing up, I did what any self-respecting geek would do: I got sidetracked with something else. In this case, the something else was the surprisingly nuanced and interesting The Number 23, starring Jim Carrey and directed by Joel Schumacher.

In the movie, Carrey is obsessed with numerology and how so many of the things in his life add up to the number 23. He's “haunted by the number” and ultimately “attacked by the number” as the movie progresses through its twists and turns.

What I found interesting was the method by which he found 23 to be such a pervasive number, ranging from the character's birthday (February 3) to the time on a clock (2:15 is 2/3 if you look at an analog clock face). Numerology is all about the ordinal value of letters though, where A is 1, B is 2, and so on. So much of the movie is about how words and names add up to 23 too.

Ah, I thought, could I write a shell script that would do basic numerology? Could it be that this very magazine is infused with that evil number? Let's find out!

Breaking Words into Characters

The first step in writing a basic numerology script is to learn how to break down a word or phrase into its component parts, scrubbing it of all punctuation and white space. We also want to convert all uppercase to lowercase, or vice versa, as A and a should have the same numeric value (1).

This can be done with a single line in a script, thanks to the ever-powerful tr command:

tr '[A-Z]' '[a-z]' | tr -Cd '[:alnum:]' 

The first call to tr converts uppercase to lowercase, as required (though to be completely portable, I really should have written it as '[:upper:]' '[:lower:]', but I wanted to have both common idioms demonstrated here for your reading pleasure).

The second call to tr is a bit more tricky: the -d option instructs the program to delete characters in the input stream that match the specified set, and -C reverses the logic of the match. By using '[:alnum:]', I pull out only the letters and digits, stripping everything else.

Let's see this snippet at work:

$ echo "This Is A - 12,3 - Test" | \

tr '[A-Z]' '[a-z]' | tr -Cd '[:alnum:]'

thisisa123test 

And, that's neatly and easily done. Now, the tougher part—how do you step through a word, letter by letter, in a shell script? That's a job for the cut command!

I'm going to use a stepping integer variable to make life easier too, called ptr (here's an example of where a Perl or C for loop with all its power is sorely missed):

ptr=1

while [ some condition ] ; do

letter="$( echo $cleanword | cut -c $ptr )"

ptr="$(( $ptr + 1 ))"

done 

The question is what condition should we be testing so that it'll get every character in the string, but nothing else? According to the cut man page, the program will produce a nonzero return code on failure, and it certainly seems to me that an invocation like this:

echo 123 | cut -c4 

should be an error, because there is no fourth character, but experimentation demonstrates that it isn't the case. Here's how I tested it:

#!/bin/sh 

echo 123 | cut -c4 

if [ $? -ne 0 ] ; then

echo error condition

else

echo no error condition

fi 

Alas, the result is “no error condition”. On the positive side, cut does return a null string correctly, so we can test for that. But, because we're doing maximum paranoia coding, it's useful also to have the length of the word or phrase. After all, what if it's 23 characters long?

Given that the length is already computed (with a quick call to wc -c), the conditional simply can be to test ptr against the string length, calculated after the string is cleaned up. In other words, while [ $ptr -lt $basislength ].

Calculating Letter Value

The hardest part of this script unquestionably is mapping letters to numeric values. Perl, C, Awk and just about every scripting language has a solution, but within the shell itself? There's nothing I can imagine without extraordinary levels of effort.

Fortunately, there's a 15-character Perl solution that lets us write a command suitable for dropping into a command pipe:

perl -e '$a=getc(); print ord($a)-96' 

Thus, we have a tool to calculate the ordinal value without too much difficulty, now that we know how to extract individual letters:

ordvalue="$(echo $letter | \

  perl -e '$a=getc(); print ord($a)-96' )" 

Let's put it all together and see where we are:

#!/bin/sh 

# Given a word or phrase, figure out its numeric equivalents 

ptr=1 

if [ -z "$1" ] ; then

  echo -n "Word or phrase: "

  read basis

else

  basis="$@"

fi 

basis="$( echo $basis | \

   tr '[A-Z]' '[a-z]' | \

   tr -Cd '[:alnum:]' )" 

basislength="$( echo $basis | wc -c )" 

echo "(Working with $basis which has \

   $basislength characters)" 

while [ $ptr -lt $basislength ] ; do

  letter="$( echo $basis | cut -c $ptr )"

  ordvalue="$(echo $letter | \

     perl -e '$a=getc(); print ord($a)-96' )"

  echo "... letter $letter has value $ordvalue"

  ptr="$(( $ptr + 1 ))"

done 

exit 0 

The conditional at the top lets this script be maximally flexible. If you specify a word or phrase when you invoke the script, it'll use that. If you forget, it'll prompt you to enter a word or phrase. Either way, that ends up as basis, which is then successively cleaned up to remove unwanted letters. basislength is the length of the resultant string, which is stepped through, letter by letter, in the while loop.

A quick test:

$ sh numerology.sh

Word or phrase: linux

(Working with linux which has 6 characters)

... letter l has value 12

... letter i has value 9

... letter n has value 14

... letter u has value 21

... letter x has value 24 

Great. We have the basis of a numerology calculator with all the difficult work taken care of. All that's left is to do some summary values and push around possible combinations to see if we can ascertain whether that pesky 23 does indeed show up everywhere!

Acknowlegdement

Thanks to Dave Sifry for his help with that spiffy little Perl code snippet.

Dave Taylor is a 26-year veteran of UNIX, creator of The Elm Mail System, and most recently author of both the best-selling Wicked Cool Shell Scripts and Teach Yourself Unix in 24 Hours, among his 16 technical books. His main Web site is at www.intuitive.com, and he also offers up tech support at AskDaveTaylor.com.