At the Forge

Node.JS

Reuven M. Lerner

Issue #205, May 2011

Want to write high-performance network server applications? Node.JS uses JavaScript to do exactly that.

Back in 1995, a number of my coworkers and I went to a big event in New York City where Sun Microsystems, a major UNIX vendor at the time, was announcing its new programming language, Java. Java, of course, was impressive in many ways, but what wowed us was the ability to write “applets”, little Java programs that executed inside the browser. Also at that event was browser powerhouse Netscape Communications, who demonstrated a separate programming language that executed inside the browser. Netscape originally called the language LiveScript, but in the wake of the hype that Java generated, Netscape renamed it JavaScript.

Fast-forward to today, and it's amazing to see how much of this story has changed. Sun is no more, having been bought out by Oracle. Netscape is no more, although its crown-jewel browser has been turned into a leading open-source project. Java has become popular and ubiquitous, and there no longer is any need to convince programmers that it's worthwhile to learn. And, although in-browser applets still exist, they are a tiny fraction of what people now do with Java.

The most interesting part of this whole story is JavaScript. Originally meant to be a simple language put inside browsers, then renamed as part of a marketing effort, you could say that JavaScript had a troubled childhood. Each browser's implementation was slightly different, making it hard to write programs that would work on all browsers. Many implementations were laughably unstable or insecure. One friend of mine enjoyed demonstrating this with a Web page that contained a “while” loop that opened an infinite number of “alert” dialog boxes. Execution was fairly slow and used a large amount of memory. And, of course, there were all sorts of language features that were hard to understand, ambiguous, implementation-dependent or annoying. Adding insult to injury was the odd standardization process that JavaScript went through, giving it an official name of ECMAScript. (Of course, no one really calls it that.)

Nearly everything about JavaScript seems to have changed in the past few years. JavaScript used to be the language everyone used for lack of an alternative. Now, JavaScript is coming into its own. This is certainly true for client-side programming. The ease with which it's now possible to create good interfaces is a testament not only to front-end developers, but also to libraries, such as Prototype, MooTools and jQuery, that make it enjoyable, rather than painful, to work with JavaScript.

Because so many sites now use JavaScript extensively, the need for fast, stable JavaScript engines has grown dramatically. Each of the major open-source browsers (Firefox, Chrome and Safari) now has a team of specialists working to make JavaScript better in all ways, and the improvements are obvious to those who have upgraded their browsers in the past year. JavaScript is getting a great deal of love and attention, and you can expect further improvements during the coming months and years.

Some of these modern JavaScript implementations now are available outside the browser as independent libraries. This means if you want to create a non-browser program that uses JavaScript, you can do so without too much trouble.

About a year ago, a friend and colleague told me that JavaScript was starting to show some potential as a language for server applications. I laughed this off, saying it was probably a fad or a crazy project. After all, I asked him, who would want to use JavaScript as a server-side language, when we have such excellent languages and frameworks already?

Of course, the joke is on me. In the past year, more and more people have started to use JavaScript as a server-side language. This is due in no small part to the emergence of Node.JS, an amazingly fast engine for network applications written in JavaScript, which also was covered by Avi Deitcher in last month's LJ.

The secret to this speed isn't just JavaScript, although that's certainly part of the equation. Node.JS uses Google's V8 JavaScript engine, along with native C++ and JavaScript code. The other reason for Node.JS's high speed is that it is event-driven. Rather than handling incoming traffic with many different processes (à la classic Apache) or threads (modern Apache, as well as some other servers), Node.JS handles all incoming connections in a single process and a single thread. This form of programming is a bit strange at first, but it works very well—so well, in fact, a large community has formed around Node.JS with many plugins and extensions.

This month, I take a quick look at Node.JS, what you can do with it, and why its usage is growing, especially in high-demand Web applications. Even if you never end up using Node.JS in your own work, I assure you that after you've seen what it can do, it'll change your thinking about what JavaScript is and how you write Web applications.

Installation

Although it's common to think of Node.JS as a JavaScript program, it's actually an engine on top of which JavaScript programs run. Node.JS itself is actually an executable you must install onto your machines.

I'm normally a big fan of Ubuntu's packaging mechanism, which allows me to use apt-get install to fetch and install whatever software I want. Node.JS isn't yet available for Ubuntu 9.10, which I have running on my server, so I was forced to install it from source. Fortunately, that's quite simple to do, especially if you're familiar with the Git version-control system. First, I cloned the repository from GitHub:

git clone git://github.com/ry/node.git

Then, I compiled Node.JS by going into the node directory and running the standard commands for compiling source:


cd node
./configure && make && make test && make install

Note that when you compile Node.JS, you're compiling a program that includes the V8 JavaScript engine, so don't be surprised if it takes a while to compile on your machine. The default installation goes under /usr/local/, including /usr/local/lib/node, /usr/local/include/node and (for the executable) /usr/local/bin/node.

Now that it's installed, what can you do? Well, the traditional thing to do in any programming language is a “Hello, world” program. So let's look at one (modified from an example in the Node.JS documentation):

var http = require('http');

http.createServer(function (request, response) {
        var startTime = new Date().getTime();
    response.writeHead(200, {'Content-Type': 'text/plain'});
    response.write("line 1\n");
    response.end('Hello World\n');
    var elapsedTime = new Date().getTime() - startTime;
    console.log("Elapsed time (in ms): " + elapsedTime);
}).listen(8124);

console.log('Server running at http://127.0.0.1:8124/');

The first thing that comes to mind when I look at code like this is, “Wow, JavaScript can look like any other language!” Perhaps that's an odd thing to think or say, but I'm so used to seeing JavaScript inside an HTML page or (better yet) in a file of its own but inside unobtrusive document-ready blocks in jQuery, that seeing a server-side JavaScript program that doesn't reference the DOM even once is a new and strange experience.

The first line uses the require function, provided by CommonJS. CommonJS is an API that attempts to fill in the gaps left by the JavaScript standard, now that JavaScript is used beyond the browser. There are a number of implementations of the CouchJS standard, of which one is in Node.JS. One of the most useful aspects of the specification has to do with modules, allowing you to do in JavaScript what's taken for granted in other languages—putting a number of function and variable definitions into a file and then importing that file via a reference name into a program. With CommonJS installed, the require function is, thus, available. The first line puts all of the definitions from the http module into our http variable.

With that in place, you invoke the http.createServer function. This function takes one parameter—a function that itself takes two parameters: a request and a response. The request object contains everything you would expect in an HTTP request, including headers, parameters and the body. The response object, which is created by the server, contains the actual response headers and data.

If you are new to JavaScript, it might seem a bit odd that I'm passing a function as a parameter. (And, if you're not used to anonymous functions, you had better start now!) But I'm also not directly invoking that function. Rather, this is the way you tell Node.JS that when an HTTP request comes in via the server, your function should be invoked—and the HTTP request should be passed to the function's first parameter.

Indeed, this style is at the heart of Node.JS. You typically don't invoke functions directly. Rather, you tell the underlying infrastructure that when a request comes in, such and such a function should be invoked. This use of “callbacks” is already somewhat familiar to anyone who has used JavaScript in a browser. After all, a client-side JavaScript program is nothing more than a bunch of callbacks. But in the server context, it seems a bit different, at least to me.

Now, what does this callback function do? First, it gets the current time, in milliseconds and stores it in a variable (startTime). I'll use it later on to find out how long the execution took.

The callback then uses the built-in functions that have been defined for the response object to send data back to the user's browser. Several methods are available to use. response.writeHead sends the HTTP response code, as well as one or more HTTP headers, passed as a JavaScript object. response.write (which should be invoked only after response.writeHead) sends an arbitrary string to the user's browser. The response to the user needs to finish with a call to response.end; if you include a string as a parameter, it's the same as calling response.write with that string, followed by response.end.

The final thing that this function does is print, on the console, the number of milliseconds that have elapsed since it first was invoked. Now, this might seem a little silly when using a toy program like this one. But even when I used ApacheBench to make 10,000 total requests with 1,000 of them happening concurrently, Node.JS kept chugging along, handling each of these requests in either 0 or 1ms. That's pretty good from my perspective, and it matches the extreme performance others have reported with Node.JS, even on more sophisticated programs.

The call to createServer returns an HTTP server object, which I then instruct to listen on port 8124. From that point on, the server is listening—and each time it receives an HTTP request, it invokes the callback. At any given time, Node.JS is handling many simultaneous connections, each of which is sending or receiving data. But as a single-process, single-thread program, Node.JS isn't really doing all of this simultaneously. Rather, it's doing its own version of multitasking, switching from one task to another inside its own program. This gives Node.JS some pretty amazing speed.

npm and More Advanced Programs

What, you're not impressed by a high-speed “hello, world” program? I can understand if you're hesitating. And besides, the last few years have shown how powerful it can be to have a high-level abstraction layer for creating Web applications. Perhaps if you were writing low-level socket programs, it wouldn't be a problem for you to send each header and the contents. But maybe there's a way to have the high speed of Node.JS, while enjoying a high-level Web development library. Or, perhaps you're interested in building not a Web application, but something that'll be appropriate for a newer protocol, such as Web Sockets.

I've already shown that Node.JS supports the CommonJS standard for external modules, such that you can require a file and have its contents imported into a local variable. In order to promote the distribution and usage of many such modules, Isaac Schlueter created npm, the Node.JS package manager. npm doesn't come with Node.JS, but I expect this will change over time.

To install npm, simply run the following command (but not as root!) from the shell:

curl http://npmjs.org/install.sh | sh

If you find you cannot install it because of the permissions associated with the node.js directory, you should not install npm as root. Rather, you should change the permissions on the node.js directory (typically /usr/local/nodejs), such that you can install npm as a regular user.

Once you've installed npm, you can get a list of what's available with npm list. This lists all the packages, and at the time of this writing, there were more than 3,700 packages available, although I must admit that each version of a package counts toward the list.

To install one of these packages, simply type:

node install express

And sure enough, the npm module “express” is installed. I should add that it took me a while to get the permissions right, such that npm could install things into /usr/local on my server to which a nonroot user typically has limited rights. I hope these sorts of permission issues will go away in the future, perhaps by putting npm's files in a place other than /usr/local.

Now that you have installed this module, what do you do with it? You can write a simple Web application for starters. Express is designed to be much like Sinatra, a simple Web server for Ruby. Here's a simple “Hello, world” program in express, based on the express documentation:

var app = require('express').createServer();

app.get('/', function(req, res){
    res.send("Hello, world\n");
});

app.listen(3000);

In other words, you first require the express module. Because you downloaded express via npm, it is available to you automatically. You don't need to set any paths or options. You then get the result back from loading the module and immediately create a server, which you put into your app variable. app is what you will use throughout your application.

Then, you tell the application that when it receives a GET request for the '/' path, it should execute the function that you indicate. Notice that you don't have to deal with the low-level details of HTTP responses here. You simply can send your response and be done with it.

You then tell the application to listen on port 3000. You can save and run your application, and when you go to /, you get your greeting.

Well, what else can you do? I've expanded express.js a bit and put it into Listing 1. To begin with, you can see that by specifying a Rails-style route (/person/:id) with a colon in front of one of the path segments, you can set a parameter name that is retrieved automatically, and that is then available via app.params.id:

app.get('/person/:id', function(req, res){
    res.send('Oh, you want information about person ' 
    ↪+ req.params.id + "\n");
});

Going to /person/100 will result in the output:

Oh, you want information about person 100

which means that the parameter can be used as the key in a database, for example. (And if you wonder whether Node.JS can talk to a database, be aware that there are adapters for many of them—both relational databases, such as MySQL and PostgreSQL, and also non-relational databases, such as MongoDB, Redis and CouchDB.)

You aren't limited to GET requests:

app.post('/foo', function(req, res){
    res.send("You requested foo\n");
});

If you ask for /foo via a POST request, you will get this response. But if you ask for /foo via GET, you will receive a 404 error from Node.JS.

Finally, you also can use templates on the filesystem. One particularly Ruby-style template is called ejs, and it has a virtually identical syntax to Ruby's ERb (embedded Ruby), including the need for a “views” directory and for a layout. Create a views subdirectory, and put index.ejs in it, as per Listing 2. You then can do something like the following:

app.get('/file/:id', function(req, res) {
    res.render('index.ejs', {
      locals: {param: req.params.id}
    })});

Here, you're taking the parameter (which you're calling id), and you're passing it to your template (index.ejs) as the local name param. You then ask express to render your template with the variable in it. Sure enough, your template is rendered in all of its HTML glory, with the data that you passed to it.

Actually, that's not entirely true. Express looks for a layout, much as Rails templates do, and if it doesn't find a layout, it'll throw an exception. You could create a layout, but it's easier just to modify the express application's configuration. Do that by setting parameters inside app.set:

app.set('view options', {
  layout: false
    });

Once that is added, your template is rendered just fine.

Conclusion

Node.JS already has started to affect the way that people write Web applications and even how they think about writing Web applications. Some sites (such as GitHub) have moved toward Node.JS for specific, high-performance tasks. Others are looking to change over completely. I don't think I'll be using Node.JS for a Web application any time soon, but I can think of several other ways it would be useful. Node.JS already has had a huge impact on the world of Web developers, and it appears poised to continue to hold this position of leadership for some time to come. Certainly, the days when I scoffed at the notion of server-side JavaScript have long gone.

Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD candidate in learning sciences at Northwestern University, researching the design and analysis of collaborative on-line communities. Reuven lives with his wife and three children in Modi'in, Israel.