Literate Programming Using Noweb

Andrew L. Johnson

Brad C. Johnson

Issue #42, October 1997

Noweb is a tool designed to enable a programmer to write documentation and code at the same time, with the goal of producing code that is easy to understand and maintain.

In essence, the purpose of literate programming (LP) can be found in the following quote:

“Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.”—Donald E. Knuth, 1984.

Such an environment reverses the notion of including documentation, in the form of comments within the code, to one where the code is embedded within a program's description. In doing so, literate programming facilitates the development and presentation of computer programs that more closely follow the conceptual map from the problem space to the solution space. This, in turn, leads to programs that are easier to debug and maintain.

When writing literate programs, one specifies the program description and the program code in a single source file, in the order best suited to human understanding. The program code can be extracted and assembled from this file into a form which the compiler or interpreter can understand—a process called “tangling”. Documentation is produced by “weaving” the description and code into a form ready to be typeset (most often by TeX or LATeX).

Many different tools have been created for literate programming over the years. Most of the more popular are based, either directly or conceptually, on the WEB system created by D. E. Knuth (“Literate Programming”, The Computer Journal (27)2:97-111, 1984). This article focuses on Norman Ramsey's noweb—a simple to use, extensible literate programming tool that is independent of the target programming language.

Overview of the Noweb System

When you write a literate program using noweb, you create a simple text file (which by convention has a .nw extension) in which you provide all of the technical documentation for the various parts of the program, along with the actual source code for each part of the program.

Figure 1. Typeset Version of Perl Script

This file ( Listing 1 ), which we call the nw source file, is then processed by noweave to create the documentation in a form ready for typesetting (the typeset version of the program is shown in Figure 1), or by notangle to extract the code chunks and assemble them in their proper order for the compiler or interpreter (the executable version of the program is in Listing 2 ). These two processes are not stand-alone programs, but a set of filters through which the nw source file is piped. It is this pipeline system that makes noweb both flexible and extensible, since the pipelines can be modified and new filters can be created and inserted in the pipelines to change the behavior of noweb.

Like most literate programming tools, noweb depends on TeX or LATeX—(LA)TeX to refer to either—for typesetting the documentation (although it has options for producing HTML output as well). However, one need not be a (LA)TeX guru to produce good results. All of the hard work of cross-referencing, indexing and typesetting the code is handled automatically by noweave.

The Typeset Documentation

The best way to get a feel for the capabilities of noweb is by reference to the finished product: the typeset version of a program. Figure 1 represents the typeset version of a Perl script that actually extends noweb's functionality by providing a limited “autodefs” filter. This filter will recognize and mark package and subroutine names for automatic cross-referencing and indexing.

When looking at this example, one can quickly see how chunks of actual code are interspersed throughout the descriptive text. Each code chunk is uniquely identified by page number and an alphabetic sub-page reference. For example, in Figure 1, there are four code chunks on the first page labeled in the left margin as 1a, 1b, 1c and 1d.

Besides the marginal tag, the first line of each code chunk also has its name and a chunk reference enclosed in angle brackets at the left margin and perhaps cross-reference information at the right margin. Lets examine chunk 1b more closely—a reasonable facsimile of its first line is:

1b <

This line tells us that we are now in chunk 1b. The <Global variables 1a>+= construct tells us we are working on the chunk named Global variables whose definition begins in chunk 1a. The += indicates that we are adding to the definition of Global variables. At the right margin we encounter (1d) <1a 1c>, which means that the chunk we are defining is used in chunk 1d, and that the current chunk is continued from chunk 1a and will be further continued in chunk 1c. It should be noted that all of these visual cross-referencing clues—with the exception of the chunk name itself—are provided automatically by noweb.

At the end of any chunk there are two optional footnotes—a “Defines” footnote and a “Uses” footnote. A user can manually specify, in the nw source file, a list of identifiers (i.e., variables or subroutines) which are defined in the current chunk. In addition, some identifiers may be automatically recognized, if an “autodefs” filter for the programming language is used. There are autodefs filters available for many languages including C, Icon, TeX, yacc and Pascal).

These identifiers are listed in the “Defines” footnote below the chunk where their definition occurs, along with a reference to any chunks which use them. Any occurrence of a defined identifier is referenced in a “Uses” footnote below the chunk that uses that identifier.

For example, in Figure 1, we see that chunk 1c defines the term $index_prefix which is used in chunk 2b. A quick peek at chunk 2b verifies that, indeed, this term is used and appears in the “Uses” footnote for that chunk.

Chunk 1d, autodefs.perl, represents the top level description of our entire program. This chunk is referred to as a “root” chunk in noweb and is not used in any other chunk. Our example has but one root chunk, although as many as you wish can be defined in your nw source file, and notangle can extract each of them into separate files.

The first line of code in chunk 1d is the obligatory #!/usr/bin/perl line which must begin all Perl scripts intended to be invoked as an executable program. However, the next two lines are not lines of Perl code at all but instead are references to other named chunk definitions. The code from those referenced chunks will be inserted at this point in the executable program extracted by notangle. Thus, we have a broad overview of our program, uncluttered by the specific global variable initializations and subroutine definitions.

Looking at chunk 2a, which is included in our root chunk, we see that it also includes another chunk, chunk 2b. This demonstrates that the inclusion of chunks can be nested to practically any level and can occur in any order in the documentation (definitions need not precede uses).

Our documentation ends with two optional indices provided by noweb—an index of code chunks and an index of identifiers.

Writing the Program in Noweb

With the knowledge of what comes out the end of the pipeline in hand, we can now describe the structure of the nw source file itself. The nw source file for our example program is given in Listing 1.

When you write a noweb program, you alternate between explaining some piece of code and providing the formal definition of that piece of code. You must indicate whether you are entering documentation or code by using one of two noweb tags.

To begin writing documentation, one starts with an @ symbol in the left column, followed by either a space or a newline. This indicates that all of the text that follows, at least up to the next tag, is documentation text. All documentation text is passed through the filtering process to the (LA)TeX file. Thus, the author is responsible for providing any special formatting such as sections, tables, footnotes and mathematical formulae which may be desired or needed in the documentation.

In addition to the standard (LA)TeX command set, noweb provides three additional control sequences. Any text surrounded by double square brackets in the text is typeset in the same fashion as literal code, and the \nowebindex and \nowebchunks commands expand into the two types of indices shown at the end of our example in Figure 1.

To indicate the beginning of a code chunk, you use double angle brackets surrounding a name for the code chunk followed by an equal sign:

<<code_chunk_name>>=

Everything following this construct is considered to be literal code or a reference to another chunk name. You reference another chunk name by placing its name in double angle brackets with no trailing equal sign. As with documentation, a code chunk is terminated when another tag is encountered. To continue a code chunk definition, you start a new code chunk using the same name within the brackets as the chunk to be continued.

The special formatting and cross-referencing of code chunks is handled automatically by noweb and requires no special input by the user—with the one exception of manually specifying identifier definitions.

To manually list identifiers which are defined in a given chunk, you terminate that chunk with a line of the form:

@ %def

The identifiers given on the line will be placed in a “Defines” footnote for that chunk and will automatically be cross-referenced and indexed by noweb as described in the previous section.

The process by which notangle extracts the code into a form suitable for the compiler or interpreter follows just a few simple rules. A root chunk is specified on the command line as the chunk to be extracted and assembled. This chunk is then printed line by line until a reference to another chunk is encountered. At this point, the referenced chunk is output line by line—and similarly for any chunks referenced therein. When the referenced chunk has been output, notangle continues the process of outputting the root chunk.

When dealing with continued chunks—two or more chunks sharing the same name—notangle concatenates their definitions in order of appearance into a single, named chunk. The extracted code for our example program is in Listing 2, and it can be seen that all spacing and indentation is preserved appropriately in the executable version.

Because of the way notangle extracts and assembles its input, the program can be presented and explained in the best order for human understanding. notangle will make sure that the program chunks are in the right order for the compiler or interpreter.

The Incantations

Now that we know how to create a program in noweb, we can examine the methods of generating our typeset and executable versions of the program. The noweb distribution provides a general shell script called, remarkably, noweb which drives the notangle and noweave processes. However, this method of invocation, though simple, is somewhat limited. We will focus here on using each tool separately as this method provides a more flexible approach.

When you invoke notangle, you specify a chunk name (a root chunk) to extract and assemble from the nw source file. If you fail to specify a chunk, notangle searches for a chunk named * to extract (this is the default root chunk in a noweb program). The notangle tool writes to stdout, so you must redirect this to a file of your choice. The general form of the command is:

notangle [-R
        [-filter

Thus, to extract the executable version of our example program we use:

notangle -Rautodefs.perl autodefs.perl.nw >
         autodefs.perl
The -R option specifies which root chunk to extract. The -L option is used to embed line directives, if they are supported by the compiler/interpreter you are using. The line directives refer to locations in the nw source file. Thus, when debugging your code, you never need to refer to the executable version. Rather, you can edit the code in the nw source file. The default format is for use with the C preprocessor, but it also works well with Perl, with one catch. The line directives are emitted whenever a chunk is entered or returned to, and refer to the next line of code. Therefore, in a script such as ours, a line directive winds up as the first line of the executable version before the #! line, rendering it non-executable. The fix for this is to delete the first line directive, or to move it below the first line and increment the line number by one.

One can write filters for use with either notangle or noweave that manipulate the source once it is in the pipeline. The pipeline representation of the nw source file in noweb is beyond the scope of this article (see the “Noweb Hacker's Guide” included in the documentation of the distribution). However, it should be noted that a filter could easily be constructed which automates the solution to the line directive problem.

The typeset version of the program is generated with the noweave tool. There are several useful options for noweave, all detailed in the man pages. We will only consider a few of the most important options here.

The first option of general interest concerns the desired output: you can specify -latex (default), -tex or -html as the formatting language to be used for the final documentation. Each of these options will supply an appropriate wrapper (which can be suppressed with the -n option) for the typeset version. You can write your nw source file intended for LATeX typesetting and still have the option of producing an HTML document by invoking noweave with the -html option and the LATeX-to-html filter (-filter l2h) included with the distribution.

The -x option enables cross-referencing and indexing of chunk names, as well as any identifiers which are automatically recognized by an “autodefs” filter. Using the -index option implies -x and provides cross-referencing and indexing for manually defined identifiers—those mentioned in @ %def statements in the nw source file.

Normally, noweave inserts additional information, such as the filename for use in page headers, as part of its wrapper. The -delay option causes noweave to suspend the insertion of this information until after the first documentation chunk. This is most useful when you wish to provide your own (LA)TeX wrapper to specify additional packages or to define your own special formatting commands. This implies a -n (omission of wrapper) option and requires that you make sure to include a \end{document} control sequence in a documentation chunk at the end of the file to complete the wrapper. Our example nw source file is written in this fashion.

Our typeset version (Figure 1) was produced by first extracting the autodefs.perl root chunk with notangle and making it executable using the chmod system command. We then placed this executable in the noweb library directory and invoked noweave as:

noweave -autodefs perl -delay -index
        autodefs.perl.nw > autodefs.tex

We can then run LATeX on the resulting file—twice, to resolve page references—to create the dvi file, then use dvips to create the postscript version for inclusion with this article.

Additional options allow you to have the index created from an external file, to expand tabs and to specify alternative formatting options provided by the included noweb.sty file. The latter includes options to omit chunk numbering in the left margins, change text size in code chunks and switch from the symbolic cross-referencing of code chunks occurring at the right margin to simple footnote style cross-referencing similar in style to the “Defines” and “Uses” footnotes.

Conclusion

In general, a literate program takes more time and effort to initially produce. However, since much of this initial effort is devoted to explaining each part of the program, the author is likely to produce a better quality program in the end, because she has put more thought into the program's design at each stage of the game. Additionally, by investing in the extra effort of creating a well-documented program, the time spent later in maintaining and upgrading the program is considerably lessened.

In terms of documentation and explanation, the ability to describe components as they come into play in the design of the program—rather than in the order they must occur for the compiler or interpreter—is a vast improvement over traditional commented code. In addition to the benefits of improved code and easier maintenance, literate programs can also serve well as teaching tools.

Availability and Notes

Andrew Johnson is currently a full time student working on his Ph.D. in Physical Anthropology and a part time programmer and technical writer. He resides in Winnipeg, Manitoba with his wife and two sons, and he enjoys a good dark ale whenever he can. He can be reached at ajohnson@gpu.srv.ualberta.ca.

Brad Johnson is currently pursuing a degree in Statistics at the University of Manitoba.