Work the Shell

Exploring Pipes, Test and Flow Control

Dave Taylor

Issue #141, January 2006

More shell script programming building blocks, and finally, our first small script.

Last month we started out easy, with a discussion of file redirection. This month I continue to talk about the basic building blocks of shell script programming by exploring pipes, then we jump into some basic programming statements so we can move into an interesting programming project.

Many people who start working with the Linux command line don't realize that it is unlike the world of the graphical interface, where programs are all standalone entities that can't really interact with each other (that is to say, Photoshop can't feed output directly to Microsoft Word, if you're a Windows person, or in Linux terms, The GIMP can't easily interact with OpenOffice.org). We've been taught to think of programs as autonomous, but when you're on the Linux command line, programs can all communicate with each other.

This is a real boon, because it means that instead of having roughly 1,800 different commands available, you actually have the equivalent of millions of different commands that can be put together to do just about anything you can imagine.

The key is the | (pipe) symbol, which hooks the output of the first command to the input of the second. For example, want to know how many files you have in your home directory? The simple solution is:

ls | wc -l

which is invoking the ls command to list files, but instead of displaying the output on the screen, it's actually fed to the wc (word count) program, with the -l option indicating that we want to have a count of the number of lines in the input stream, rather than the number of words or characters.

Now, here's something a bit more complex. Let's say that you want to know how many files you have that were last modified in August. When you use the ls -l command, you notice that the lines are typified by the following:

drwxr-xr-x    11 taylor  taylor    374 Aug  16 21:57 ConnectSafely

This provides lots of information, but all we care about is that the month of last modification is shown as a three-letter abbreviation surrounded by spaces. The grep command makes it easy to match only specific patterns in the input stream, so now let's build a three-part pipeline that lists all files in the current directory, screens out everything that isn't from Aug, and then counts how many lines remain:

ls -l | grep " Aug " | wc -l

See how that works? You should be thinking that people with even a rudimentary grasp of the standard 20 or 30 Linux commands has a powerful interactive environment at their beck and call. You'd be right! (Note that if you don't have files that are old enough, you won't see the month name in the ls -l output. Move to an older directory, like /etc, and try the command again; odds are you'll find sufficient old files in that directory instead.)

You can even have a pipeline that has its final output saved to a file by simply adding a redirect to the end of the pipeline:

ls -l | grep " Aug " > files.from.August

And, with the use of the little-known tee command, you can even save a copy of the data stream in the middle of a pipeline too:

ls -l | grep " Aug " | tee aug.output | wc -l

Here we have the same output as earlier, but now a copy of the intermediate results are neatly tucked away in the aug.output file. (A helpful tip too: you can use tee /dev/tty and have a copy of intermediate output shown on screen, even though it's also being fed to the next step in the pipeline.)

Thousands of Linux commands are accessible from the command line, and all but a small percentage are easily added to a command pipe. Given that a typical command also has at least a half-dozen different options to change its behavior, you can get a sense for just how rich the command-line environment is and why so many Linux power users and administrators still eschew the GUI for most of their work.

Flow Control and the Test Command

The next building block with shell scripting is flow control. This is an essential ingredient of any programming language, from the obscure APL to the now-pedestrian BASIC. Fortunately, there are a number of flow control elements available for shell script programmers, ranging from the most rudimentary if-then-else-fi to the more sophisticated while-do-end and repeat-until blocks and switch-case-end.

To look at flow control, it's necessary that we detour for a few minutes and talk about one of the most important commands for shell script programmers: test. The test command often is the program that evaluates conditional statements and ascertains whether the result is TRUE or FALSE, obviously a key capability for any sort of conditional flow control.

Believe it or not, the test command is linked to the [ command, which is why you can write conditional statements one of two ways, as exemplified here:

if test -f filename
if [ -f filename ]

This particular conditional tests to see if filename is indeed a file (the -f test). If you use the more readable [ notation, you are required to include the closing ] symbol too; whereas, if you use test overtly, you can skip any closing symbol on a conditional.

Tip: using the [ symbol has a second benefit. Many modern shells have a version of the test command built in to the shell itself, considerably speeding up shell script execution. Using the [ symbol ensures you'll use the built-in version if available, but explicitly calling test means that you'll likely not have that performance enhancement when running your scripts.

The test command has at least 30 different options, and it's critical that you become familiar with them, so you can understand how to test two alphanumeric strings (for example, filenames) versus how you might test numeric values (file sizes) or even perform a bewildering set of file and directory tests, including tests for execute permission, nonzero size, whether it's a pipe or socket and many more possibilities. To begin learning more about this command, type man test in your terminal.

Armed with the test command, a standard if-then conditional is structured as shown:

if [ condition ]
then
        statement block if condition is true
else
        statement block if condition is false
fi

Oftentimes, you'll see programmers use a small shorthand by adding a semicolon, so that the first two lines are instead written as:

if [ condition ] ; then

Here's a quick example of how this might be used before I run out of space in this issue:

if [ -w . ] ; then
   echo "I can write to the current directory. "
else
  echo "I cannot write to the current directory. "
fi

As you can see, this offers a quick way to test whether you have write permission to the current directory. Type it in to an editor (vi or emacs, whatever you prefer) and save it in your home directory as dir.write.sh, then you can use cd to move to different directories and run this first shell script by typing sh ~/dir.write.sh to see whether you have write permission in that directory.

Out of space. Next month, we'll spend more time looking at conditional statements and flow control and start noodling on how to write a rudimentary blackjack game as a shell script. See you then!

Dave Taylor is a 25-year veteran of UNIX, creator of The Elm Mail System, and most recently author of both the best-selling Wicked Cool Shell Scripts and Teach Yourself Unix in 24 Hours, among his 16 technical books. His main Web site is at www.intuitive.com.