Letters to the Editor

Michael K. Johnson

Issue #28, August 1996

Readers sound off.

SPARC?

I was a SPARC user until I encountered a hardware problem. I found it difficult to get service for the SPARC hardware; small site end users don't get much support from SUN, it seems.

The Linux/x86 world is just the opposite. There is a very competitive market offering a wide variety of options in all price/performance ranges. It creates a truly affordable Unix computing system.

I wish the Linux community would stay with x86 instead of making Linux just another piece of software on those expensive proprietary boxes.

—Siuki Chan siuki.chan@xilinx.com

Why Not?

When Unix was first written it was for a PDP-7 (and written in assembly language). Subsequent ports to additional hardware helped turn Unix into the machine-independent system it is today. Associated with the 25-year history of Unix and its machine-independence has been additional overhead.

Linux offers a fresh start. Porting Linux to other architectures can bring a new machine-independent operating system into general use. While many people today see the x86 as the right answer I am sure people felt the same way about the PDP-11 (which was the second platform for Unix). Designing portability into Linux now means it will be much easier to get running on the hardware of the future—whether it be based on the SPARC, DEC Alpha, MIPS or something totally new.

—Haykel Ben Jemia haykel@cs.tu-berlin.de

AWKward mistake

I would like to say that I really like LJ and that through it I learn something new about Linux every month. In issue 25 (May 1996), I found the article Introduction to Gawk very interesting because I often use Gawk. But when trying things out, I noticed something wrong with the output of the FS statement. The article stated that

{
    FS=":"
    print $5
 }

and

BEGIN {
    FS=":"
}
{
    print $5
}

are functionally identical, except that the first one is slower.

But when I executed the first program, the first line was blank. With the second program, everything was okay. So I asked on the Red Hat mailing list (because I use the Red Hat distribution) if someone could help me. Marc Ewing provided the real answer to the problem:

The line is split into fields before the rule is evaluated, so when the FS=":" is evaluated the first time, the line has already been split up, and either no fifth field exists, or in some situations the fields are wrong. So the two awk programs are not functionally identical; the first program is incorrect.

I hope this information will be useful for someone.

Oops

That was tested before being put in the magazine, but the bug was hard to notice (and was missed) because the /etc/passwd file on the machine used to test the script ran the output off the top of the screen. Thanks for bringing this to our attention.

lcc for ELF?

In Issue 25 of Linux Journal, the lcc compiler was reviewed [Introduction to Awk, by Ian Gordon] and the FTP site for lcc was listed as ftp://ftp.cs.princeton.edu/pub/lcc/. However, I can only find a.out ports of lcc to Linux. Is anybody working on ELF support in lcc?

—Arthur D. Jerijian

Yes

The file ftp://ftp.cs.princeton.edu/pub/lcc/contrib/linux-elf.tar.gz is dated November 14th, 1995, so Linux ELF support for lcc has been around for a while.

More AWKward mistakes

To the editor:

I would like to comment on the article on gawk {An Introcution to Awk] by Ian Gordon in the May Linux Journal. Overall it is a nice introduction to the joys of awk programming, but I wish you had let me review it first.

There are a number of minor and not so minor errors in the article. In order of appearance:

1. Brian Kernighan wasn't one of the original designers of C; he “merely” wrote the book on it with Dennis Ritchie, who designed and implemented C. (Not to diminish his stature in any way; Brian is still a very important and seminal figure in the Unix and C world.)

2. The article says, “gawk also defines several special patterns wich do not match any input at all, the most commonly used being BEGIN and END”. This is incorrect. Only BEGIN and END are defined in awk, there are no others.

3. The statement “If you try to refer to fields beyond NF, their value will be NULL”, if read literally, is misleading. The value is the null or empty string, often denoted "". Granted, most programmer types would understand the statement at face value, maybe I'm just being pedantic.

4. There is a major error in the part that describes using a colon as the field separator.

        {
                FS = ":"
                print $5
        }

In gawk, field splitting occurs using the value of FS at the time the record was read. Thus, $5 will already have been determined, based on the previous value of FS (presumably a space, " "). Unix versions of awk do this incorrectly, delaying field splitting until a field is needed, but doing so with whatever value of FS is current. This is incorrect, and the POSIX standard for awk mandates that field splitting happen the way gawk (and mawk, see below) do it. In fact, my book (cited in the RESOURCES sidebar, thanks!) describes this exact issue.

The correct way to get the desired behaviour is to set FS either using the -F option, or using an assignment inside the BEGIN block, as mentioned later.

5. Some typos: “does not contain a seven field” should be “seventh”, and “modifing” should be “modifying”. And a nit. Calling the Info file a “page” is misleading. When printed, the current documentation is over 330 pages...

6. When talking about variable initialization, the article says “... setting it to 0 for an integer or "" for an integer or a string, respectively.” Not quite. Variables are initialized to 0 for their numeric value and "" for their string value. All numbers in awk are maintained internally as C double's. Numbers that look like integers are still stored as doubles. This can lead to confusion for C programmers:

        x = 5 / 4       # x is now 1.25, not 1, no integer truncation

(I've been bitten by this one, myself!)

7. The discussion of the array “for” loop is incomplete.

        for (i in theArray) print i

prints each index in theArray. To get both the indices and the corresponding values, you would need something like 8.

        for (i in theArray) print i, theArray[i]

A word about implementation speed and comparisons to Perl. There are three freely available awk implementations: the Bell Labs version, gawk, and mawk. Gawk is much much faster than the Bell Labs version. Mawk (ftp from oxy.edu), by Michael Brennan, is a very nice implementation that is (generally) even faster than gawk. Although I haven't done any timings, I'm willing to bet that an awk program run with mawk will give a comparable Perl program a really good run for its money, every time. Gawk's advantages over mawk are its additional features, its ports to more systems, and its comprehensive documentation. Mawk's advantages are its speed and rock solidness.

9. In the resource block, the title of the gawk documentation from the FSF is now The GNU Awk User's Guide, with just myself listed as the author. Because the manual changed significantly (it's now about double the previous size), we changed the title, and I am listed as the author because of all the new and heavily revised material in the guide. The title page does give credit where credit is due, saying “Based on The Gawk Manual, by Close, Robbins, Rubin and Stallman.”

Finally, I would like to point out that many Linux distributions apparently don't yet have the latest version, 3.0.0; this should be gotten from a GNU mirror. There are a large number of nifty new features and bug fixes over the previous version, as well as the revised manual.

Please accept this note as constructive comments on an otherwise enjoyable article, one that I wish I had had time to write...

Thanks,

—Arnold Robbins gawk maintainer and documenter arnold@gnu.ai.mit.edu