Expanding Options for Clustering

Ken Dove

Issue #82, February 2001

The role of Linux in the future of clustering.

Ten years ago, clustering meant one thing—grouping a small number of large servers together with shared disks to host on-line transaction processing applications with better availability and, in some cases, scalability. Since then, we've seen the rise of internet-based applications and thin commodity servers, and a whole new model of open, scalable, distributed computing built upon Linux.

Many vendors in the Linux space are working on reproducing the kind of clustering that existed ten years ago on traditional UNIX OLTP systems. This may not be the right strategy. Rather, clustering should look toward the types of applications and architectures that are being deployed in the internet data center and on sets of thin Linux servers. In the internet data center—whether in a hosting environment or in a corporation—application architectures have several characteristics that are fundamentally different from the architectures of the old glass house. In particular, the internet data center architectures share some important common themes:

  • Architectures are multitiered (usually web servers, application servers, and database servers).

  • Each tier may have literally dozens of servers.

  • Each tier has unique requirements for shared and replicated data.

In a large e-business or web content site, managing the entire distributed set of resources that provide a service over the Web is a bigger challenge than simply providing failover for individual servers.

While some applications, like OLTP or some kinds of e-commerce failover, still require shared disk solutions, modern applications, in many cases, can take advantage of software-based replication provided by middleware. What they lack is the infrastructure to bind those solutions together and integrate them in the larger computing environment. Application management—including failover—remains one of the key problems facing today's operations manager, but the options are, fortunately, getting better.

The applications that do require shared access to data now often involve dozens of machines, potentially in multiple Points of Presence (POPs). This means that even the shared disk solutions of traditional clustering, which are based on physically shared SCSI connections, may not be adequate. Those solutions are bound by physical cabling and SCSI address limitations and usually can only service two or four systems.

The Web Server Tier

The web server tier, a stronghold of Linux, is characterized by either read-only or dynamically created data. This data can be, and often is, shared across multiple servers by using network-attached storage appliances running NFS. Each server mounts the NFS appliance and has access to the right data. For this tier, complicated shared SCSI cabling schemes are totally inappropriate—a TCP/IP network provides all the interconnect between servers and storage. However, clustering software that can monitor the health and performance of web servers, and take appropriate actions based on the input from those monitors, is still very important.

The Application Tier

In the traditional corporate glass house, applications were typically client/server and were based on a relational database. For the most part, data was not stored in the application layer—everything was put in the database. The mix is much richer in the Internet/Linux world. The emergence of Java and Enterprise JavaBeans has led developers to serialize application-layer objects in the file system, outside the database. In addition, applications now manipulate a much richer universe of data—including digital streaming media, text and other data types that are not naturally stored in SQL back-ends.

For this reason, the applications layer has shared data and failover requirements that will be met by neither network-attached storager or shared SCSI. Network-attached storage typically does not work because NFS protocols do not provide enough consistency guarantees to be suitable where multiple servers may be writing the same data. Shared SCSI is not sufficient because replication nor sharing may need to happen across many systems or across multiple POPs. In many cases, therefore, applications provide their own replication. For example, consider a foreign-exchange trading application in use at several dozen large banks. Replication of real-time trading data is maintained at the application level, combined with cluster-based failover.

The Database Layer

The database layer is the domain of traditional clustering solutions from the big UNIX vendors—Sun, HP and IBM. Originally, two servers would be connected to some SCSI disks, with one server being an active master and the other being a passive standby. In more modern implementations, multiple servers share a fiber channel or SCSI storage system and use a parallel database like Oracle Parallel Server (OPS) to manage concurrent access by each node in the cluster.

However, high-end fiber channel SANs are expensive, as is software like Oracle Parallel Server. Often, the price/performance of these types of solutions are not appropriate for the thin server Linux market or for the internet data center. The active/passive SCSI approach is also wasteful of resources and suffers limited scalability. Fortunately, there are other approaches more suited to the kinds of applications being run on Linux.

For example, database vendors have put much effort into shared-nothing software replication approaches (Informix Replicator, DB/2 Replication, Oracle Multi-master and hot standby). Using these tools, it is possible for multiple servers to participate at the database tier without having shared storage. However, the databases themselves often lack the facilities to detect failure, sequence the actions to initiate failover and guarantee application integrity. For this reason, they are best paired with a clustering solution that provides these services.

A real-life example is a chip fabrication facility running a manufacturing automation application on Intel-based thin servers. Because of the value of this application, the chip manufacturer requires three-way replication, so that even if the primary fails, there are two machines ready to take over. The solution chosen was to use Informix database-replication services running on physically separated machines without shared storage in conjunction with a cluster management engine.

The Future of Shared Storage

Finally, some applications do require physically shared disks. However, because shared SCSI remains an unusual configuration, putting together complete systems using this approach requires care—often complete hardware/software system certification by a vendor. (When was the last time you checked your disk firmware level?) Even when this is all arduously put together, as we have seen, there are severe limitations on both the number of nodes that can share data over SCSI and possible physical cable layouts. Fiber channel relaxes some of these limitations but comes with drawbacks of its own, including steep price points, problems with multivendor interoperability and a set of management issues unfamiliar to many users of Linux thin servers.

The next year will see a huge change in the face of shared storage. Industry leaders are pushing two new major storage interconnects that will make shared storage available at a price point appropriate for thin servers that will also support sharing by large node counts seen in Internet data centers. Cisco and hot startups like 3ware are backing SCSI-over-IP, which will allow SCSI block protocols to run over a switched Ethernet fabric. Intel and the server vendors are lining up behind a new I/O standard called Infiniband. It will provide a switched I/O fabric that could eventually be implemented in the chip sets included on every commodity server motherboard. Indeed, these developments are complementary—Gigabit Ethernet cards will be able to sit on the Infiniband fabric and run the SCSI-over-IP protocols. As a consequence, we can look forward to having dozens or even hundreds of systems share storage over modern interconnects in the near future. Clustering solutions that provide the basis for using these interconnects to create truly manageable and scalable distributed solutions out of sets of Linux boxes will set the tone for the data center architectures of the future.

Ken Dove is chief architect of PolyServe, Inc. Previously, he was a distinguished engineer at IBM and before that principal software architect at Sequent Computer Systems for 12 years.