UpFront

diff -u: What's New in Kernel Development

Zack Brown

Issue #157, May 2007

It looks like MinixFS version 3 will be supported in modern kernels. Daniel Aragones has had a patch floating around, and Andries Brouwer recently cleaned it up and made it fully kernel-worthy. Andrew Morton has said the patch seems harmless and could be accepted at any time.

A new stackable filesystem, called RAIF (Redundant Array of Independent Filesystems), is under development by Nikolai Joukov and other folks. This filesystem performs data replication across multiple disks like RAID, but it does so on top of any set of other filesystems the user wants to incorporate into the RAIF structure. The project still is not ready to be considered for inclusion in the official kernel, and people brave enough to experiment with it should back up their data beforehand. Still, Nikolai says RAIF has reached a level of stability so that playing with it may be more fun than frustrating.

SDHC (Secure Digital High Capacity) Flash cards may soon be supported in the kernel. Philip Langdale has seen some very good results (in other words, no lost data in recent tests) with his newly written driver. John Gilmore donated an SDHC card for Philip to test on, and the SD Card Association has apparently published useful specs, which Philip has put to good use.

The desire of some folks to gain access to crash reports, even when running the X Window System, has raised the issue of whether to start migrating graphics drivers into the kernel, instead of leaving them to the X people. This would represent a fairly massive shift in kernel development, but it may be the best way to ensure that oops information is properly displayed, regardless of what graphics mode a system happens to be in when it crashes. D. Hazelton and probably others are working on this, but there definitely would be massive and long-term flame wars before any such transition could occur in the kernel.

Karel Zak has decided to fork util-linux away from the current maintainer Adrian Bunk, after he and others were unable to get a response from him regarding their patches and proposals to change maintainership. Unless Adrian decides to make a claim that he should stay as the official maintainer, it's likely that Karel eventually will reintegrate his code with Adrian's, having only a single code base once again with Karel as maintainer.

Greg Kroah-Hartman has started a mailing list for anyone who packages the kernel for a distribution. The purpose of the list is to provide a vendor-neutral place to discuss bugs and other issues associated with packaging kernels. It also may provide a more visible way for kernel packagers to submit their own changes upstream to the developers.

Return of the Luggable

Doc Searls

Issue #157, May 2007

If Linux is the ultimate hermit crab operating system—born without a home but able to live well in anybody's hardware—it'll be fun to see how well Linux lives in the funky form factors provided by Acme Portable Machines, Inc.

Acme's hardware is built for places where the practical outweighs the pretty by ratios that verge on the absolute: server rooms, factory floors, military aircraft and medical facilities. Acme makes KVM (keyboard/video/mouse) switches that slide on rails into racks and control multiple CPUs (among many other connected things). It makes kiosk PCs. It makes tower CPUs with the dimensions of portable sewing machines, featuring flat screens and keyboards that open out the side and slots inside that hold up to eight full-size cards.

But, perhaps the most interesting items are Acme's portable systems—luggable workstations that look like scary briefcases and boast features such as “flame resistant” cases. Acme calmly calls its EMP “a robust lunchbox computer built using heavy-duty metal to provide a tough, go-anywhere unit ideally suitable for harsh/severe environments and mission-critical applications”. Jim Thompson of Netgate calls it “the ultimate LAN party box”.

EMP-370

Acme Portable's primary market is OEMs, but you can see and buy its goods directly at acmeportable.com.

LJ Index, May 2007

Doc Searls

Issue #157, May 2007

1. Billions of US dollars cable operators will spend by 2012 improving digital network capacity: 80

2. Millions of US dollars quoted in the fiber-based build-out of a San Francisco municipal high-speed Internet utility: 500

3. Percentage rate of increase in fiber-to-the-home (FTTH) subscriptions in Japan: 88

4. Millions of Japanese FTTH subscriptions in March 2005: 5.4

5. Effective radiated power in watts of KRUU “open-source radio”: 100

6. Range in miles of KRUU's city-grade signal: 4

7. Number of planets served by KRUU's live Web stream: 1

8. Total US dollars paid to AT&T for continuous use of a push-button phone since the 1960s by an 88-year-old: 7,500

9. Millions of cars at the end of 2006: 800

10. Millions of PCs at the end of 2006: 850

11. Billions of Internet connections at the end of 2006: 1.1

12. Billions of credit cards at the end of 2006: 1.4

13. Billions of TVs at the end of 2006: 1.5

14. Billions of cell phones at the end of 2006: 2.7

15. Billions of cell phones in use by September 2006: 2.5

16. Millions of cell phones added in the prior year: 484

17. Percentage of new cell phones added in Asia: 41

18. Billions of cell phones expected by the end of 2007: 3

19. Projected billions in annual cell phone shipments in 2008: 1

20. Estimated billions of human beings in July 2006: 6.525170264

1: ABI Research

2: “Fiber Optics for Government and Public Broadband: A Feasibility Study Prepared for the City and County of San Francisco, January 2007”, by Communications Engineering & Analysis for the Public Interest

3, 4: Broadband Properties, December 2006

5: FCCInfo.com

6: radio-locator.com

7: KRUU

8: The Consumerist

9–14: Tomi T. Ahonen and Alan Moore in Communities Dominate Brands

15–18: Wireless Intelligence, via Cellular News

19: Gartner via windowsfordevices.com

20: CIA's World Factbook

KRUU Models Open-Source Radio

Doc Searls

Issue #157, May 2007

KRUU is a community FM station of the new “low power” breed that is intended to serve local areas with noncommercial programming (www.fcc.gov/mb/audio/lpfm). As the FCC puts it, “The approximate service range of a 100-watt LPFM station is 5.6 kilometers (3.5 miles radius)”. In KRUU's case, that range nicely covers the town of Fairfield, Iowa. The station's Web stream, however, is unconstrained by those physical limitations. I listen to it in Santa Barbara, and it's already one of my faves.

The station's About page tells why it's especially relevant and cool:

KRUU's commitment to community also extends to the software and systems that are in place at the station. All the computing infrastructure uses only Free Software (also sometimes termed open-source software). The Free in this case is in reference to freedom, and not cost—all the software comes with the underlying source code, and we contribute all our changes, edits and suggestions back to the Free Software community. The reasons for using Free Software go far beyond the scope of cost. KRUU wishes to build local knowledge using systems that do not impose restrictions or limitations on use. To this end, we support software that is licensed under a “copyleft” or “open” clause, and content that is licensed under Creative Commons.

The word “open” runs through everything KRUU plays and values. From 5–6 am, seven days a week, it runs the “Open Source Radio Hour”. And, that's just the tip of the freeberg.

For listening with Linux, it profiles XMMS, Banshee, Amarok, VLC and Rhythmbox. KRUU streams in MP3, but it also podcasts in Ogg Vorbis. I listen to a lot of radio on-line, and I don't know of a station that's more committed to free software and open-source values than this little one. Find them at kruufm.com.

Parallel NFS (pNFS) Bridges to a Mature Standard

Larry Jones

Issue #157, May 2007

Thanks to the emergence of low-cost Linux clusters, high-performance computing (HPC) is no longer the domain solely of an elite group of public-sector-funded laboratories. In fact, HPC now can be found addressing challenges as diverse as simulating the behavior of the entire earth to the simulation needs of an individual product designer.

But, as clusters have become more prevalent, new challenges have emerged. The first challenge is to define a storage and I/O architecture that is not only capable of handling the vast amount of data created and consumed by these powerful compute engines, but that also is capable of keeping those engines fully utilized and fed with data. Without data, the largest and fastest supercomputers become nothing more than expensive space heaters.

The second challenge revolves around making the data generated by clusters easily available to other systems and users outside the cluster itself. Copying or moving data to other systems is clearly an option but involves inherent overhead cost and complexity. Ideally, any node on the network should be able to access and process data where it resides on the cluster.

Initially, clusters used the ubiquitous NFS standard, which has the advantages of being well understood, almost universally supported by many vendors and of providing easy access to data for systems and users outside the cluster. However, NFS moves all data and metadata through a single network endpoint (server) that quickly creates a bottleneck when trying to cater to the I/O needs of a cluster. The result is that neither bandwidth nor storage capacity scales—a new solution is required.

Parallel filesystems, which enable parallel access directly from server nodes to storage devices, have proven to be the leading solution to this scalability challenge. Although parallel filesystems are relatively new, the technology clearly will become an essential component of every medium- to large-scale cluster during the next few years. Several parallel filesystem solutions are available today from vendors such as Panasas (ActiveScale PanFS), IBM (GPFS), EMC (HighRoad) and Cluster File Systems (Lustre).

Government, academic and Fortune 500 customers from all over the globe have embraced parallel filesystem solutions; however, these solutions require that customers lock in to a particular vendor for the software and sometimes the hardware. Wouldn't it be nice to have a filesystem that has the same performance as these vendor-specific solutions but that is also a true open standard? Then, you could reap the performance benefits of parallel access to your data while enjoying the flexibility and freedom of choice that come from deploying a universally accepted standard filesystem.

This introductory article discusses Parallel NFS (pNFS), which is being developed to meet these needs. pNFS is a major revamp to the NFS standard and has gained nearly universal support from the NFS community.

When people first hear about pNFS, sometimes their initial reaction is that it is an attempt to shoehorn a parallel capability into the existing NFS standard. In reality, it is the next step in the evolution of NFS with the understanding that organizations need more performance while keeping it a multivendor standard. The NFSv4.1 draft standard contains a draft specification for pNFS that is being developed and demonstrated now.

Panasas is the author of the original pNFS proposal. Since this original proposal was written, a number of other vendors, notably EMC, IBM, Network Appliance and Sun, have joined to help define and extend pNFS. Other vendors are contributing as well, so pNFS is gaining broad momentum among vendors.

Because pNFS is an evolution of the NFS standard, it will allow organizations that are comfortable with NFS to achieve parallel performance with a minimum of changes. Plus, because it will become part of the NFS standard, it can be used to mount the cluster filesystem on the desktop easily.

NFSv4.0 improved the security model from NFSv3.0, which is the most widely deployed version today, and it folds in file locking that was previously implemented under a different protocol. NFSv4.0 has an extensible architecture to allow easier evolution of the standard. For example, the proposed NFSv4.1 standard evolves NFS to include a high-speed parallel filesystem. The basic architecture of pNFS is shown in Figure 1.

Figure 1. pNFS Architecture

The pNFS clients mount the filesystem. When they access a file on the filesystem, they make a request to the NFSv4.1 metadata server that passes a layout back to the client. A layout is an abstraction that describes where a file is located on the storage devices. Once the client has the layout, it accesses the data directly on the storage device(s), removing the metadata server from the actual data access process. When the client is done, it sends the layout back to the metadata server in the event that any changes were made to the file.

This approach may seem familiar, because both Panasas (ActiveScale PanFS) and Cluster File System (Lustre) use the same basic asymmetric metadata access approach with their respective filesystems. It is attractive because it gets the metadata server out of the middle of the data transaction to improve performance. It also allows for either direct or parallel data access, resulting in flexibility and performance.

Currently, three types of storage devices will be supported as part of pNFS: block storage (usually associated with SANs, such as EMC and IBM), object storage devices (such as Panasas and Lustre) and file storage (usually associated with NFS file servers, such as NetApp). The layout that is passed back to the client is used to access the storage devices. The client needs a layout driver so that it can communicate with any of these three storage devices or possibly a combination of the devices at any one time. These storage devices can be products such as an EMC SAN, a Panasas ActiveScale Storage Cluster, an IBM GPFS system, NetApp filers or any other storage systems that use block storage, object storage or file storage. As part of the overall architecture, it is intended to have standard, open-source drivers (layout drivers) for block storage, object storage and file storage back ends. There will be other back ends as well. For example, PVFS2 was used in the first pNFS prototype as the back-end storage.

How the data is actually transmitted between the storage devices and the clients is defined elsewhere. It will be possible for the data to be communicated using RDMA (Remote Direct Memory Access) protocols for better performance. For example, the InfiniBand SDP protocol could be used to transmit the data. The data can be transmitted using SCSI Block Command (SBC) over Fibre Channel or SCSI Object-based Storage Device (OSD) over iSCSI or using Network File System (NFS).

The “control” protocol shown in Figure 1 between the metadata server and the storage is also defined elsewhere. For example, it could be an OSD over iSCSI.

The fact that the control protocol and the data transfer protocols are defined elsewhere gives great flexibility to the vendors. It allows them to add their value to pNFS to improve performance, improve manageability, improve fault tolerance or add any feature they want to address as long as they follow the NFSv4.1 standard.

A natural question people ask is “how does the proposed pNFS standard avoid vendor lock-in?” One of the primary aspects of pNFS is that it has a common filesystem client regardless of the underlying storage architecture. The only thing needed for a specific vendor's storage system is a layout driver. This is very similar to how other hardware is used in Linux—you use a driver to allow the kernel to access the hardware.

Parallel NFS also works well for the vendors because it allows their storage to work with a variety of operating systems without porting their whole proprietary filesystem stack. Because NFSv4.1 will be a standard, the basic client would be available on a variety of operating systems as long as the OS had the client. The only piece the vendor would have to provide is the driver. Writing drivers is generally an easier process than porting and supporting a complete filesystem stack to various operating systems.

If you have a current parallel filesystem from one of the storage vendors, what does pNFS do for you that the vendor does not? Initially, pNFS is likely to perform more slowly than a proprietary filesystem, but the performance will increase as experience is gained and the standard pNFS client matures. More important, pNFS allows you to mount the filesystem on your desktop with the same performance that the cluster enjoys. Plus, if you want to expand your storage system, you can buy from any vendor that provides a driver for NFSv4.1. This allows your existing clients to access new storage systems just as your computers today access NFS servers from different vendors, using the filesystem client software that comes with your UNIX or Linux operating system.

Parallel NFS is well on its way to becoming a standard. It's currently in the prototyping stage, and interoperability testing is being performed by various participants. It is hoped that sometime in 2007 it will be adopted as the new NFS standard and will be available in a number of operating systems.

If you want to experiment with pNFS now, the Center for Information Technology Integration (CITI) has some Linux 2.6 kernel patches that use PVFS2 for storage (www.citi.umich.edu/projects/asci/pnfs/linux).

Unlocking the Music Locker Business

Doc Searls

Issue #157, May 2007

Thanks to Linux and other open-source building materials, on-line storage has become a commodity. Last year, Amazon made it available in a general way through S3. Then, in February 2007, Michael Robertson's MP3tunes did the same for one vertical breed of storage, the music locker. The name of the service is Oboe, and it stores unlimited amounts of audio for you. The free service is supported by advertising. The $39.95 US annual service offers customer support, allows bigger files and works as an offsite NAS.

After the Oboe news broke, we caught up with Michael for a brief interview.

LJ: Are the servers you're using Linux ones? (I assume so, but need to ask.)

MR: Yes, of course. No Microsoft in the house. With massive storage needs, it's imperative we have an industry-leading cost structure, and you get that only with LAMP. You can't be at a cost disadvantage to your competitors.

If you look at the battle for e-mail, it's a great illustration. Hotmail had expensive EMC hardware that let it get scooped by Yahoo, which was based on Veritas. Then Google raised the ante, and thanks to LAMP technology, it was able to make a storage offer that was, and is, costly for Yahoo and Microsoft to match. Music lockers are even more storage-intensive, so cost is an even bigger issue. Relying on LAMP lets us ride the cost efficiency of declining storage costs.

LJ: What can you tell us about the other technology involved?

MR: CentOS—three hundred terabytes of storage. We deploy new servers every week. We're standardized on 750GB drives, but we'll move to 1TB drives later this year. A big issue for us is power and floor space, not processing performance.

LJ: Your music lockers seem similar in some ways to Amazon's S3, the unlimited storage of which is being used as a business back end for companies that sell offsite backup. Is there any chance you'll look to provide a back end to ad hoc or independent music services? What I'm thinking here is that a big back end can make possible a number of businesses that do not yet exist, and open the music industry to much more entrepreneurship and competition at every level.

MR: Yes, there are similarities but also differences. Both services have APIs that open them to a wide range of software and hardware applications (see mp3tunes.com/api). But, MP3tunes is tailored to music delivery. And, I'd contend that music is a very unique data type because of its repeat usage from many locations. So, built in to the API are a wealth of audio-related features, such as transcoding formats, down-sampling bit rates, meta-tags and cover art. Here's an example: Mp3tunes.com is a mobile interface that can stream or downloads songs from your locker directly to mobile devices, seamlessly changing format and bit rate.

LJ: Let's look at the economics of this. I easily can see partnerships with the likes of Sonos, which are also Linux-based. Seems like $40 US for a service like yours is an argument against customers maintaining their own NAS (network attached storage) device.

MR: Yes, expecting people to manage their own NASes is like expecting people to run their own backyard power generator. It's just dumb. You'll have better service and greater cost efficiency by using a centralized system that can take advantage of economies of scale. We have talked to Sonos, and there's interest. I hope they support our API.

They Said It

I think that novels that leave out technology misrepresent life as badly as Victorians misrepresented life by leaving out sex.

—Kurt Vonnegut, A Man Without a Country (Random House, 2005)

Truth arrives in conversation. Not solitude....Those of us who resist...the privatizing of the cultural commons, might ask ourselves what it is we are trying to protect. For the content industries, the tools of enclosure (copyright term extension, digital rights management, etc.) have been called into being to protect and enhance revenues; it's about the money. For we who resist, the money is the least of it. Protecting the cultural commons means protecting certain ways of being human.

—Lewis Hyde, in a talk to Berkman Center, February 13, 2007

This song is Copyrighted in the U.S., under Seal of Copyright # 154085, for a period of 28 years, and anybody caught singin' it without our permission will be mighty good friends of ourn, cause we don't give a dern. Publish it. Write it. Sing it. Swing to it. Yodel it. We wrote it, that's all we wanted to do.

I have often reaped what others have sowed. My work is the work of a collective being that bears the name of Goethe.

—Johann Wolfgang Goethe, www.publicknowledge.org/resources/quotes

The preservation of the means of knowledge among the lowest ranks is of more importance to the public than all the property of the rich men in the country.