XSLT Powers a New Wave of Web Applications

Cameron Laird

Issue #95, March 2002

Cameron introduces XSLT and shows why it's such a hot topic in application development.

Extensible Stylesheet Language for Transformations (XSLT) is a computing language specialized for mapping XML documents into other XML documents.

Explanation of XSLT is no small ambition. The problem has to do with variety; there are many uses for XSLT, many instances of XSLT engines and many cooperating technologies involved in XSLT application, so it's important to focus on the essentials.

Universal XML

The first XSLT essential is its Extensible Markup Language (XML) base (also see the “Glossary of XSLT Terms” Sidebar). XML is the universal data format designed to encode everything: algorithmic data, programs and documents from purchase orders to biblical translations, in any human language, on any kind of computer and operating system. XML looks like HTML except it's a bit more complicated. In fact, one of XML's design goals is to generalize HTML in a way that preserves the comfort of HTML adepts. There's even a flavor of XML called XHTML that permits direct interpretation as HTML. Linux Journal frequently publishes articles on different aspects of XML.

A fully XML-ized world is a simpler one, in many ways. To analyze the operation of an accounts payable department, for example, you don't need to know who reports to whom, who is due for a three-week vacation and all those other messy human details. If you can draw a diagram that shows invoices coming in and payments going out, perhaps with authorization records spawned along the way, then you have abstracted what ought to be the essential information.

This is an intoxicating insight. It promises that a system that can transform one XML document (invoice) into one or more other XML documents (payment check, authorization records) and at least organizes, and possibly solves, all meaningful organization automations. That's why XSLT seems so important now.

Readers with a background in the XML world should generalize their project experience to get a notion of XSLT's true worth. Anyone with practical knowledge of XML knows that it's only the beginning of a solution, not the miracle cure marketing brochures often make it out to be. XSLT is exactly the same: a useful and even powerful way to organize the real work of engineering applications fit for production. The idea of transforming XML documents is an important one; to see whether it's the right idea requires close attention to the technical details.

An Engine of Your Own

To start you on your XSLT career and help you get the proper feel for the language, you'll need an engine, or language processor, of your own. The most widely used are based on Java and/or are proprietary. These often are integrated into larger server products: database servers, application servers and so on.

Rather than any of these, this article presents its examples in terms of the tDOM engine. tDOM has several advantages, among which the most important are that it's available under a liberal open-source license, it's exceptionally thrifty on memory and twice as fast as competing XSLT engines in our benchmarks, its installation is quick and compact and it exposes a scriptable command mode that's convenient for instruction. Moreover, tDOM fits well in the dual-level programming style explained below, and it's robust enough to be in production use at several demanding sites already.

To set up your own copy of tDOM, see the “How to Start XSLT Programming” Sidebar. That Sidebar concludes with a first example of XSLT use, invoked as

tclsh8.3 xslt.tcl example1.xml example1.xsl
example1.html

This command line says, “Use version 8.3 of the Tcl interpreter to launch the xslt.tcl program. The xslt.tcl utility applies the example1.xsl stylesheet to the example1.xml document and produces example1.html as its output.”

Look at this first as a machine that takes example1.xml as its input and produces example1.html, which has only a couple of lines:

<?xml version="1.0"?>
<datum>first message</datum>

Think of example1.html as an expansion of this into well-formatted HTML:

<html><body><h1>first message</h1></body></html>

XML as Data and Code

If all you need is a simple HTML document like example1.html, you can write it directly or use a lightweight macro language, rather than learn XSLT. The value of XSLT begins to appear when you look at more complex examples. You can set up the XSLT transformation to generate example1.html output in a particular style, perhaps with approved fonts or boilerplate site hyperlinks and disclaimers.

XSLT uses the language of stylesheets to specify these transformations. While stylesheets were in use before XSLT's invention, this article ignores other uses and consistently abbreviates XSLT stylesheet as just stylesheet.

On one level, a stylesheet is a program. Just as

int main()
{
    puts("Hello.");
}

is the source for a C program, a stylesheet is the source for an XSLT program. A peculiarity of stylesheets, though, is that they are themselves XML documents. Rather than looking like normal computer programs (in the way C, Java and ksh do, say), XSLT source is a kind of markup text (see Listing 1).

Listing 1. example1.xsl

With a verboseness typical of XML, this says, roughly, “act as a program that pulls out <datum> elements and puts their contents in <h1> headings of well-formatted HTML.” That's how example1.html is generated.

The application that implements this XSLT interpretation is itself a Tcl program. There's little you need to learn about Tcl at this point. tDOM exposes its XSLT engine with Tcl bindings, and the xslt.tcl script simply treats command-line variables as the filenames of XML documents and passes them on to the engine.

Let's review the example invocation. In

tclsh8.3 xslt.tcl example1.xml example1.xsl
example1.html

tclsh8.3 is the name of the executable program we're launching, and xslt.tcl is a minimal Tcl script that wraps the tDOM XSLT engine. If we wanted to improve the error handling of this utility, refinement of xslt.tcl would be the natural place to start.

Running xslt.tcl creates an XSLT processor that receives three filenames. The example1.xml file is a sample XML source document. This file names the stylesheet we apply to example1.xml. The process writes the resulting output document to example1.html. Select different logical contents for example1.html by naming a different XML source, perhaps example2.xml. To change the style of the output, rewrite example1.xsl.

You've now successfully run an XSLT program. All that's left to learn are the details of XSLT as a language and how it's applied to real-world problems. Before more on the syntax and semantics of XSLT, let's look at its uses.

One Language, Many Applications

Suppose you're responsible for a web site of tens of thousands of pages. You maintain those pages in an organization-specific XML vocabulary that strips out formatting information and HTML blemishes; your documents hold only the logical content specific to each page. Visitors need HTML, of course, but you generate that automatically, along with standard headers, frames, navigation bars, footnotes and all the other decorations we've come to expect on the Web. XSLT gives you the ability to update site style instantaneously for all the thousands of managed documents. Moreover, it partitions responsibility nicely between XML content files and XSLT stylesheets, so that different specialists can collaborate effectively.

That executive-level description masks quite a bit of implementation variability. Where and when does the XSLT transformation take place? You might have a back end of XML documents, which you periodically process with a command-line XSLT interpreter to generate static HTML documents served up by a conventional web server. You might keep the XML sources in a database, from which they're retrieved either as XML, as transformed HTML or even as full-blown HTTP sessions. Various application servers, content managers and even XML databases provide each of these interfaces. Another variation is this: you might keep only sources on your server and, with the right combination of HTML extensions and browser, direct the browser itself to interpret the XSLT you pass. You can make each of these steps as dynamic as you like, with caching to improve performance, customization to match browser or reader characteristics and so on.

This multiplicity of applications makes vendor literature a challenge to read. We all adopt different styles of Java programming depending on whether we're working on applets, servlets, beans and so on, even though all these fit the label of Java web software. Similarly, it's important to understand clearly what kind of XSLT processing different products offer.

Complex Site Development

Neil Madden, an undergraduate at the University of Nottingham, has an XSLT system tuned for especially rapid deployment and maintenance. His scheme is organized around multisection sites, authored by teams of administrators, editors and users. He uses TclKit, an innovative open-source tool that combines database and HTTP functionality in a particularly lightweight, low-maintenance package. TclKit also knows how to interpret Tcl programs, so he wraps up tDOM with standard templates into a scriptable module. With this, he begins site-specific development:

  1. Design an XML document structure that captures the content of the site's data.

  2. Compose XSL stylesheets to transform data to meet each client's needs.

  3. Repeat steps one and two for each section that needs special requirements.

  4. Add users, sections and pages.

Scripted documents encapsulate these bundles of different kinds of data (site structure, XML sources, stylesheets) and make it easy to update and deploy a working site onto a new server or partition. Madden has plans to offer not just web-based editing but also a richer, quicker GUI interface. Tcl's uniformity and scriptability make this dual porting through either web service or local GUI practical.

Well-defined module boundaries are essential to the system. Designers maintain stylesheets, administrators manage privileges and editors assign sections without collision. With all of the functionality implemented as tiny scripts that glue together reliable components, it's easy to layer on new functionality. Madden's medium-term ambitions include a Wiki collaborative bulletin board, and XSP and FOP modules for generation of high-quality presentation output. Madden proudly compares his system to Cocoon, the well-known, Apache-based, Java-coded XML publishing framework. At a fraction of the cost in lines of code, his system bests Cocoon's performance by a wide margin.

Even further along in production use of tDOM XSLT is George J. Schlitz of MediaOne. He prepares financial documents with XSLT in a mission-critical web environment. While he originally began publication with Xalan, performance requirements drove him to switch to tDOM.

The fundamental point in all this is to be on the lookout for XML-coded or XML-codable data. Chat logs, legal transcripts, printer jobs, news photographs, screen layouts, genealogical records, game states, application designs, parcel shipments, medical files and much, much more are all candidates for XML-ization. Once in that format, XSLT processing is generally the most reliable and scalable way to render the data for specific uses.

Learning XSLT

We still have a lot to learn about XSLT as a profession. As fast as its use is expanding, there remain fewer programmers competent in XSLT than, say, Object Pascal.

Another hurdle in XSLT's diffusion, along with its unconventional XML-based syntax and confusing deployment, is its functional or applicative semantics. Most computing languages that appear in Linux Journal are more or less procedural: Java and C programs instruct a processor to perform one operation, then another, then another. Proceduralism is wrapped up in the how of computation.

XSLT is related to Lisp in its functionality. Good XSLT programs express the “what” of a desired result. Instead of a focus on sequence-in-time, XSLT operates on whole XML documents or well-formed fragments to yield results. This is called functional (to evoke the model of mathematics), where functions turn inputs into outputs without side effects or variation in time. Moreover, mathematical functions can be composed (stacked) in combination. Typical XSLT semantics express several different transformations without specification of their sequence-in-time. Stylesheets are applied simultaneously.

XSLT has variables, but they are immutable. They can receive only one value and cannot, in particular, be looped, as in

for (i = 0; i < 10; i++);

Parametrized or repetitive operations are done through explicit recursion and iteration. The syntax of XSLT variable use is rather ugly, as it must fit within XML's constraints. In C or Java we might write

if (level > 20)
    code = 3;
else
    code = 5;
To approximate that in XSLT, we need
<xsl:variable name = "code">
     <xsl:choose>
         <xsl:when test = "$level &gt; 20">
             <xsl:text>3</xsl:text>
         </xsl:when>
         <xsl:otherwise>
             <xsl:text>5</xsl:text>
         </xsl:otherwise>
     </xsl:choose
</xsl:variable>
Such exercises illustrate that, while XSLT has enough abstract power to handle general problems by itself, it's often best used in a dual-programming mode. XSLT's strengths generally lie in template processing, pattern matching and sorting and grouping XML elements. Steve Ball, a principal for consultancy Zveno Pty. Ltd., does as much as practical with XSLT, then embeds it in an application with another language to handle interfaces to external systems, including filesystems and user view.

Most popular among the developers I've encountered during the last six months are Java and Tcl, although Python, C, Perl and other partner languages also manage XSLT engines more or less adequately. Moreover, XSLT also defines extensibility mechanisms that allow developers to provide new semantics within XSLT: “extension elements”, “extension functions” and “fallback processing”.

XSLT's ultimate destiny remains unclear. In this, also, it's a bit like Java. Five years ago, Java's purpose appeared to be to construct cute visual applets. As we've discovered since then, heavy-duty enterprise servers actually make better hosts for the best Java programming. We're still at an early stage in deciding when to use XSLT. ReportLab, Inc., for example, is an enterprise vendor that delivers products and services having to do with high-quality report generation to some of the largest organizations in the world, including Fidelity Investments and American Insurance Group. ReportLab founder Andy Robinson explained to me the deep experience his development team has in projects that transform XML. Each project his company has fulfilled has involved coding in a lighter-weight scripting language rather than relying on XSLT. Even though XSLT is specialized for XML transformation, his consulting teams have found it easier to use Python as a more general-purpose, but powerful language.

Acknowledgements

My special thanks to Rolf Ade, who contributes both to tDOM software and especially to my understanding of it.

How To Start XSLT Programming

XSLT Study

Resources

A Glossary of XSLT Terms

email: claird@starbase.neosoft.com

Cameron Laird is a full-time developer and vice president of Phaseit, Inc. He also writes frequently on programming topics and has published several articles during the last year on XSLT. He's currently preparing a training course on the language.