At the Forge

Communication in HTML5

Reuven M. Lerner

Issue #202, February 2011

HTML5 gives Web applications new communication features.

Last month, I took an initial look at HTML5, the new standard for HTML that increasingly is supported by popular browsers. HTML5 includes a large number of different technologies and solutions, and it's being rolled out in a piecemeal fashion. Some parts of HTML, such as the abbreviated DOCTYPE declaration at the top of a document, can be used already. Other parts, such as new elements for HTML forms, are not quite ready yet for mainstream use, unless you are willing to use JavaScript to handle elements in browsers that lack support for those features.

For me, the improved form tags and expanded semantic markup for sections, headers and the like are both important and useful advances. But in many ways, I believe some of the most important improvements are a bit harder to understand and will take longer to catch on. These are improvements in how different pages may communicate with one another within the same browser or how the Web browser may communicate with outside resources.

To date, it has been difficult, if not impossible, for a program running in the browser (that is, with JavaScript) to create a mashup between two data sources or interact with another open window. This was because of a combination of historical, practical and security concerns, all of which were understandable. But today, we want our browsers to do more. Indeed, most modern applications are being developed for the browser, and if we can somehow push more information and intelligence to the browser, we'll both reduce the load on our servers and improve the responsiveness of the application.

A number of features in HTML5—and yes, some of these features aren't really part of the HTML5 specification, but I'll pretend they are for the purposes of this column—are designed to solve precisely this problem. These features aren't meant to make it easier to create Web pages, but rather Web applications. Specifically, I'm talking about interpage communication, WebSockets and threads, known as Web workers. Each of these topics probably deserves a column to itself, so I admit that the examples I provide here are meant to give you a taste of what's possible, rather than a comprehensive tutorial or example. Nevertheless, I hope you'll become excited about the possibilities raised here, and perhaps you'll even think of new and interesting ways to make use of these features.

Interpage Communication

The basic building block of the Web is the page, and more specifically, the page of HTML. Sure, modern Web browsers can display all sorts of different formats, but for all intents and purposes, when you talk about a Web app, you're talking about one or more pages of HTML. Nowadays, that page can be modified via JavaScript, using a variety of different techniques that often are lumped together with the Ajax name, regardless of how true or appropriate that is. Ajax has complicated things somewhat, removing the one-to-one correspondence that used to exist between HTTP requests and HTML pages. Although this means the flow of our applications has become more complicated, Ajax has made the Web a far more useful and friendly platform for users, and even for developers.

For many years, Web browsers were one-page-per-window affairs. If you wanted to browse three pages simultaneously, you needed to open multiple windows. Modern browsers all support the use of tabs, with some users (like me) abusing this feature quite a bit, opening dozens of tabs each day and then taking my sweet time to read and close the contents of each one. Given that all of these tabs (or windows) are running within the same program, it should be possible for them to communicate using a combination of JavaScript and HTML.

Perhaps this should be the case, but to date, it has not been possible. That's due to privacy concerns. You didn't want one Web page to be able to read from or write to another page without permission, and there wasn't a standard that would provide such permission. This is true not only in the case of two separate pages, but also in the case of two iframes on the same page, which might want to affect each other.

Now, if you're thinking you actually have been able to do this in the past without any hoopla, you might be right. It's true that an iframe can communicate with, and even modify, its parent window, but only if the two pages have the same origin. HTML5 changes the rules of the game by allowing pages to communicate with one another, even if they have different origins.

This works as follows. The sending page invokes the postMessage() method on the window or iframe that should receive the message, along with the expected origin of the receiver. For example, let's create a very simple HTML file that contains little more than an iframe (Listing 1). For now, ignore the JavaScript event handler that is defined there. I'll get to that in a bit.

Inside this page is an iframe. Now, for demonstration purposes, this iframe will have the same origin as the outer page. But in many cases, the iframe will come from a completely different origin. In HTML5, that doesn't matter at all. You can send a message to whichever recipient you like. If you look at the HTML source for iframe.html, you'll see how to accomplish this:

$(document).ready(function() {
$("#send-button").click(function() {
    window.parent.postMessage($("#text-to-send").val(), '*');
});
});

In this example, I use jQuery to grab the button whose ID is “send-button”. I then add an event handler to that button, indicating that when it is clicked, it should invoke window.parent.postMessage, sending the text contained inside the text field. I should note that the postMessage() method can be called on any window or iframe, and that it can send any text in its first parameter.

The second parameter indicates the origin of the recipient to whom you're sending this message. In this case, I have indicated that the recipient may have any origin by specifying a wild card. In production environments, it's probably safe to assume you will want to specify an origin. By stating the recipient's origin, there's a bit of additional safety—the message will be sent only if the receiving window object's content is from the stated origin.

On the receiver's end, the posted message arrives as an event, one which the receiver can (and should!) examine before using. Going back to atf.html, you will see how the receiver accepts a message in its event handler:

$(document).ready(function() {
window.addEventListener('message', receiver, false);
function receiver(e) {
    alert("origin = '" + e.origin + "'");
    alert("data = '" + e.data + "'");
        $("#message").text(e.data);
};
});

The event handler for this page indicates that it's willing to accept a message. Each message consists of two pieces, the message (the text string that the sender passed as a parameter to postMessage) and the origin (the sender's origin). Note that the sender cannot set its origin; this piece of information is handled by the browser.

Because the origin information is passed along with the message, it's possible for the receiver to filter out which origins it is willing to accept. In other words, although it's possible a rogue site will try to start posting to random windows that you might have open on other sites, the only way such messages actually will arrive is if the receivers are willing to accept them. I'm sure someone with more of a black-hat mentality than mine will find ways to defeat this security mechanism, but from what I can tell, it was thought out very carefully and cleverly, and should avoid most mischief.

Now that it's possible for any window to send messages to any other window, what can you do with it? The answer, of course, is that no one knows. Off of the top of my head, I can imagine chat clients—or more generally, using a single window on a Web browser as a communication switchboard and clearinghouse—grabbing feeds and incoming messages and putting them on the appropriate pages. Imagine if Facebook were to have a single iframe that would handle its (very large!) number of interactions with the server, and then handle all page updates through that iframe, rather than on each individual window or tab.

I also can imagine the postMessage() method ushering in a new age of multiwindow, desktop-like applications. Think of how many desktop applications now use multiple windows—one for control, another for each document and yet another for a “palette” of options. Now you can do the same thing with a Web browser, with a native message-passing interface.

Just what people will do with these capabilities is unknown, but I predict we'll see a rash of new, rich, browser-based applications that take advantage of them.

WebSockets

One of the greatest contributions of UCB (Berkeley) to the UNIX operating system was the introduction of sockets. Sockets allowed programmers to open a connection to another computer easily and quickly. Once opened, the socket operated something like a point-to-point file handle, allowing you to ignore the fact that data in the socket was being transmitted through dozens or hundreds of other computers. A large number of Internet services, from SMTP to FTP to HTTP, use sockets. I personally have used them for nearly 20 years to implement everything from my undergraduate thesis, to Web browsers and servers, to various Internet-enabled applications.

HTML5 brings socket-like connections to the browser, using a technology called WebSockets. WebSockets are similar in principle to UNIX sockets, in that you can open a connection to an arbitrary other point on the Internet, and send and receive data reliably without even considering the numerous hops or connections along the way.

Now, if you are an experienced Web developer, you might be wondering what the big deal is. After all, Ajax calls allow you to open HTTP connections and send and receive data. And xhr (the XmlHttpRequest function) has been around for a few years, and it works quite well. The difference is WebSockets will allow you to open one or more connections to anywhere on the Internet, not just to servers with the same origin as the current page. Moreover, WebSockets use their own protocol that is admittedly quite similar to HTTP, but it has a great deal less overhead. Finally, WebSockets remain open as long as the sides agree to do so—as opposed to HTTP, which is meant to be stateless and to be closed after a single request-response transaction. For all these reasons, communication using WebSockets generally is going to be far more efficient. A number of articles describing WebSockets have done the math and show just how much more efficient WebSockets are than HTTP—and the difference is staggering.

Working with WebSockets is remarkably simple. You open a WebSocket with some JavaScript code, often (presumably) fired either when a user performs an action (such as pressing a button) or when a certain event takes place (for example, a certain amount of time has elapsed). No matter what, you open a new WebSocket by specifying the URI to which you want to connect, starting with a protocol name (either ws or wss, for unencrypted or encrypted, respectively), continuing with the hostname, and then ending with a resource name.

Once the WebSocket is open, you can attach callbacks to it, indicating what should happen when the socket is opened, closed or receives a message. (Each time the WebSocket receives data from the remote host, it will invoke the “onmessage” callback function.)

For example, here's a simple WebSocket that retrieves data from a hypothetical weather server:

var weatherSocket = new WebSocket("ws://localhost:8080"); 
 ↪// Our own weather server

Then, you can assign callbacks:

weatherSocket.onopen = function(e) {
alert("Opened weather socket");
};

weatherSocket.onmessage = function(e) {
alert("Received a message: " + e.data);
};

weatherSocket.onclose = function(e) {
alert("Closing the weather socket...");
};

Finally, you can send messages by invoking the send() method—yes, the same method that you saw above, but without the second parameter indicating the origin.

Notice that although you write directly to the WebSocket using send, you don't read a result directly from it or via a return value to send(). Rather, you will get the data when it is sent to you, via the execution of your method at weatherSocket.onmessage().

One piece is missing from this description, namely a server to which the WebSocket connects. You cannot connect to just any old server on the other end, and especially not to an HTTP server. Fortunately, a growing number of packages (in various open-source languages) can handle the server side of WebSockets. One such package is the em-websocket gem for Ruby, based on the well-known eventmachine gem. WebSocket server libraries already exist for PHP and Python, as well as a number of other languages. Over time, I expect to see a number of WebSocket-compatible servers emerge.

How can you use WebSockets? As with interwindow communication, I expect the best applications and ideas haven't been developed yet. But once your Web browser can connect to any host on the Internet using a specialized high-performance protocol, you can imagine that the sky is the limit. Suddenly, Web-based chat servers no longer need to use kludges or hacks in order to allow for real-time chat. You can create mashups on the client, rather than the server. Combined with the new geolocation facilities in HTML5, you can have a map that updates your location in real time, using nothing more than HTML and JavaScript. It does mean that on the server side, Web applications now will require more than just installing Apache, but that has been true for a while now, as applications have become more sophisticated, so I don't think you need to worry about that too much.

Web Workers

Finally, another intriguing addition to HTML5 is the notion of Web workers, which you can think of as the browser equivalent of threads. Perhaps you have a complex task that needs to be handled in parallel with rendering of a page or of downloading information from the Internet. By splitting the work across two Web workers, you can take advantage of today's modern, multiprocessor computers to get a faster result. Because Web workers operate in the background, rather than on the thread that handles displaying the page, the page should be more responsive than if everything were in one thread.

Now, I must admit I generally have tried to avoid programming with threads, because of the many problems that can crop up whenever you have shared resources. Given that JavaScript was never designed to work with threads, my first thought when I heard about Web workers was how this possibly could work while keeping data safe and the browser stable. The solution appears to be sound, although it's still too early for me (and many others) to know for sure.

The idea is this. You launch a Web worker by creating a new Worker object:

var worker = new Worker("code.js");

Notice that you hand the name of a file to a Web worker. You cannot hand it a piece of code, either directly or by passing it a function reference. Perhaps it eventually will be possible for a browser-based application to create dynamically, and then store (using WebSockets?) a file on the server, but the main purpose of this restriction is to ensure that there is no chance for shared data among the various Web workers, thus avoiding the chance for issues traditionally associated with threads.

Indeed, Web workers operate almost as if they existed on different computers, with no direct connections between them. Workers cannot access the DOM, which means any elements on the page. Functions and data in the main thread are not available to the Web workers, and vice versa.

This raises the question of how the main thread and Web workers can communicate. The answer probably won't surprise you. They use postMessage(), the same mechanism for message passing that can be used to send information from one window or tab to another, regardless of origin.

I can foresee a number of uses for Web workers. First, they will allow browser-based applications to handle more than one thing at a time, ensuring that the main thread is used for rendering the UI and interacting with the user. Second, it means you can start to break problems apart, taking advantage of modern computer hardware that can put different threads on different processors intelligently. Finally, it means JavaScript now has the beginnings of a built-in message-passing mechanism. And, although programs still must remain inside a single browser, I have to assume at some point, it'll be possible to open a Web worker not just on your local machine, but on a remote one, as well.

Conclusion

Marc Andreessen, who wrote the original Mosaic browser and helped to popularize the Web as a founder of Netscape, claimed years ago that the browser is the new operating system. Even as Ajax and other advanced Web technologies have advanced during the past few years, and such amazing browser-based applications as Google Docs have emerged, I still have been skeptical of whether Web-based applications ever will truly rival their desktop counterparts. The addition of cross-window communication, WebSockets and Web workers go a long way toward convincing me that Andreessen's prediction has nearly come true.

HTML5 and its associated technologies include a wealth of new options for developers. It will take some time to figure out how well these work, how to get around the fact that not all browsers support them and just how useful (or not) various features might be. If you are a Web developer, I encourage you to study and work with these technologies as soon as possible. I already have changed the architecture of some of my applications as a result, and I wouldn't be surprised if that happens to you too.

Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD candidate in learning sciences at Northwestern University, researching the design and analysis of collaborative on-line communities. Reuven lives with his wife and three children in Modi'in, Israel.