Work the Shell

Tackling L33t-Speak

How to script a l33t-speak translator. By Dave Taylor

My daughter and I were bantering with each other via text message this morning as we often do, and I dropped into a sort of mock "leet speak". She wasn't impressed, but it got me thinking about formulaic substitutions in language and how they represent interesting programming challenges.

If you're not familiar with "leet speak" it's a variation on English that some youthful hackers like to use—something that obscures words sufficiently to leave everyone else confused but that still allows reasonably coherent communication. Take the word "elite", drop the leading "e" and change the spelling to "leet". Now replace the vowels with digits that look kind of, sort of the same: l33t.

There's a sort of sophomoric joy in speaking—or writing—l33t. I suppose it's similar to pig latin, the rhyming slang of East Londoners or the reverse-sentence structure of Australian shopkeepers. The intent's the same: it's us versus them and a way to share with those in the know without everyone else understanding what you're saying.

At their heart, however, many of these things are just substitution ciphers. For example, "apples and pears" replaces "stairs", and "baked bean" replaces "queen", in Cockney rhyming slang.

It turns out that l33t speak is even more formalized, and there's actually a Wikipedia page that outlines most of its rules and structure. I'm just going to start with word variations and letter substitutions here.

The Rules of L33t Speak

Okay, I got ahead of myself. There aren't "rules", because at its base, leet speak is a casual slang, so l33t and 733T are both valid variations of "elite". Still, there are a lot of typical substitutions, like dropping an initial vowel, replacing vowels with numerical digits or symbols (think "@" for "a"), replacing a trailing "s" with a "z", "cks" with "x" (so "sucks" becomes "sux"), and the suffixed "ed" becomes either 'd or just the letter "d".

All of this very much lends itself to a shell script, right? So let's test some mad skillz!

For simplicity, let's parse command-line arguments for the l33t.sh script and use some level of randomness to ensure that it's not too normalized. How do you do that in a shell script? With the variable $RANDOM. In modern shells, each time you reference that variable, you'll get a different value somewhere in the range of 1..MAXINT. Want to "flip a coin"? Use $(($RANDOM % 2)), which will return a zero or 1 in reasonably random order.

So the fast and easy way to go through these substitutions is to use sed—that old mainstay of Linux and UNIX before it, the stream editor. Mostly I'm using sed here, because it's really easy to use substitute/pattern/newpattern/—kind of like this:


word="$(echo $word | sed "s/ed$/d/")"

This will replace the sequence "ed" with just a "d", but only when it's the last two letters of the word. You wouldn't want to change education to ducation, after all.

Here are a few more that can help:


word="$(echo $word | sed "s/s$/z/")"
word="$(echo $word | sed "s/cks/x/g;s/cke/x/g")"
word="$(echo $word | sed "s/a/@/g;s/e/3/g;s/o/0/g")"
word="$(echo $word | sed "s/^@/a/")"
word="$(echo $word |  tr "[[:lower:]]" "[[:upper:]]")"

In order, a trailing "s" becomes a trailing "z"; "cks" anywhere in a word becomes an "x", as does "cke"; all instances of "a" are translated into "@"; all instances of "e" change to "3"; and all instances of "o" become "0". Finally, the script cleans up any words that might start with an "a". Finally, all lowercase letters are converted to uppercase, because, well, it looks cool.

How does it work? Here's how this first script translates the sentence "I am a master hacker with great skills":


I AM A M@ST3R H@XR WITH GR3@T SKILLZ 

More Nuances

It turns out there are even more nuances of Leet, and I didn't realize that most often the letter "a" is actually turned into a "4", not an "@", although as with just about everything about the jargon, it's somewhat random.

In fact, every single letter of the alphabet can be randomly tweaked and changed, sometimes from a single letter to a sequence of two or three symbols. For example, another variation on "a" is "/-\" (for what are hopefully visually obvious reasons).

Continuing in that vein, "B" can become "|3", "C" can become "[", "I" can become "1", and one of my favorites, "M" can change into "[]V[]". That's a lot of work, but since one of the goals is to have a language no one else understands, I get it.

There are additional substitutions: a word can have its trailing "S" replaced by a "Z", a trailing "ED" can become "'D" or just "D", and another interesting one is that words containing "and", "anned" or "ant" can have that sequence replaced by an ampersand (&).

Let's add all these L337 filters and see how the script is shaping up.

But First, Some Randomness

Since many of these transformations are going to have a random element, let's go ahead and produce a random number between 1–10 to figure out whether to do one or another action. That's easily done with the $RANDOM variable:


doit=$(( $RANDOM % 10 ))       # random virtual coin flip

Now let's say that there's a 50% chance that a -ed suffix is going to change to "'D" and a 50% chance that it's just going to become "D", which is coded like this:


if [ $doit -ge 5 ] ;  then
  word="$(echo $word | sed "s/ed$/d/")"
else
  word="$(echo $word | sed "s/ed$/'d/")"
fi

Let's add the additional transformations, but not do them every time. Let's give them a 70–90% chance of occurring, based on the transform itself. Here are a few examples:


if [ $doit -ge 3 ] ;  then
  word="$(echo $word | sed "s/cks/x/g;s/cke/x/g")"
fi

if [ $doit -ge 4 ] ;  then
  word="$(echo $word | sed "s/and/\&/g;s/anned/\&/g;
     s/ant/\&/g")"
fi

And so, here's the second translation, a bit more sophisticated:


$ l33t.sh "banned? whatever. elite hacker, not scriptie."
B&? WH4T3V3R. 3LIT3 H4XR, N0T SCRIPTI3.

Note that it hasn't realized that "elite" should become L337 or L33T, but since it is supposed to be rather random, let's just leave this script as is. Kk? Kewl.

If you want to expand it, an interesting programming problem is to break each word down into individual letters, then randomly change lowercase to uppercase or vice versa, so you get those great ransom-note-style WeiRD LeTtEr pHrASes.

About the Author

Dave Taylor has been hacking shell scripts on UNIX and Linux systems for a really long time. He's the author of Learning Unix for Mac OS X and Wicked Cool Shell Scripts. You can find him on Twitter as @DaveTaylor, and you can reach him through his tech Q&A site: Ask Dave Taylor.

Dave Taylor