Introducing the new DevFS: Linux 2.4 will change the way you access your devices.
Every so often, something happens that is so breathtaking, so absolutely amazing, that it changes the world. Linux 2.4 still isn't one of those things. Linus' rapid-release plan didn't actually rapidly produce a new kernel, not at the rate many developers (myself included) expected. This article is an addendum to “The Bullet Points” (January 2000) and a followup to “Linux 2.4 Spotlight: ISA Plug-and-Play” (February 2000). For a more complete rundown of Linux 2.4, see “Linux 2.4 Scorecard” and “Wonderful World of Linux 2.4”, available on Linux Today (www.linuxtoday.com).
Linux 2.2 took just over a month during its “pre-” stage, ditto with 2.0. While the 2.4 kernel is in the final development cycle, this pre-series has been extended several months and it's still difficult to predict how long it will take for the official release. The 2.2 kernel had some “brown bag” problems that I'm sure Linus would not be interested in repeating. According to the recent 2.4 bug list, there were 12 very important bugs that need fixing in the current pre-releases, 26 less important ones, several more minor minor ones, lots of things to merge and verify and other similar things. This list grows smaller daily, but it's still a big list.
Alan Cox's recent work on the “ac” series of kernels has helped to stabilize the code, but a number of items remain on the “must fix” list. I expect 2.4 to be released in a couple of months. The actual date of release, however will be entirely up to Linus.
And now, in no particular order, more of the new features of Linux 2.4.
Nearly every variant of UNIX accesses block and character devices through a common model: special device files in the /dev directory. Unfortunately, different UNIX variants name these devices in drastically different ways. BSD variations, for instance, refer to hard disks as /dev/wd* instead of /dev/hd*. (The characters “wd” stand for Western Digital, the manufacturer of IDE hard disks.) Unlike Linux, some UNIX variants use the /dev directory for network devices, too (e.g., eth0).
Although the naming of devices is different across UNIX variants, the method of how these devices communicate with the kernel is generally the same. Under Linux 2.2, special files (device nodes) for each accessible device are placed in the /dev directory. The device nodes are just two values, a major and minor number. The major number generally correspond to a driver or subsystem in the Linux kernel. The minor numbers generally corresponds to a sub-function or sub-device (e.g., a specific partition on a hard disk). Communication is actually done with these numbers. The names of the device nodes are standardized, but could be called anything an administrator wanted. (All Linux distributions today use the same standard naming scheme provided as a document with the kernel.)
There are several shortcomings of the 2.2 /dev model. First off, it uses a finite numeric name-space. Inevitably, we run out of numbers naming new devices in the standard. Second, modern devices (such as USB) don't lay well against a major/minor arrangement. Even complicated partitioning schemes reveal limitations in the current infrastructure. As it's impossible to know in advance which devices a user will install on their system, distributions create hundreds (sometimes more than a thousand) of device nodes in the /dev directory. Only a handful are used on a given machine; the rest remain to satisfy all possible configurations of all compatible hardware devices. It's a bit crazy, when you think about it. To better meet the demands of a more plug-and-play world, Linux 2.4 introduces the Device Filesystem (DevFS): in an elegant reworking of /dev, only configured devices are listed.
DevFS is a kernel-based file system. Like /proc, DevFS is seen in the filesystem tree (as /dev) but never gets “synced” to a physical device; /dev stays in RAM. Whenever a driver is loaded into the kernel and the device is detected, appropriate entries are added to the /dev tree.
Besides cleaning out the file system tree, DevFS will be faster, too. Here's how it works. Communication is direct; when you open() a file, you are communicating directly with the driver. Under 2.2, when a normal device node is open()ed, the kernel looks up the major device number in a table and calls a function. The actual driver then works out the specific device using the minor number. DevFS removes a layer of indirection. Now programs talk to drivers, not nodes.
With the introduction of DevFS and deprecation of device nodes, Linus has decided 2.2 device nodes are too limiting and going the way of the dodo. Nearly all DevFS node names are different than their major/minor counterparts, usually very different. Device classes are now categorized in subdirectories (structurally similar to /proc/scsi/*). For compatibility with 2.2 systems, administrators may still use old-style nodes (with the devfsd d<\#230>mon), but they are deprecated.
Unfortunately, there are some problems with DevFS as well. The permissions are more difficult to manage in the new system. Device permissions are allocated by the drivers, rather than by nodes in the file system. Changes to permissions can be made, but they are forgotten each time the module is loaded (or on reboot). Drivers (part of the kernel) name themselves, hence controlling the naming conventions and /dev layout, hence setting policy which is considered a bad thing. You can move (or rename) DevFS files after boot, or change their permissions, but you start with the kernel's policy. (Scripts are available to record the state of /dev at shutdown.) Presently, it's not clear which routines the distributions will use, or if they will even use DevFS. DevFS is still “experimental”, after all, though that may change before release.
While setting policy in the kernel is considered a bad thing, the many wins of the DevFS model have convinced Linus that the new way is the right way. Work is being done to make DevFS more “friendly” to administrators stung by its downsides—some of this is likely to make its way into the kernel before Linux 2.4 officially ships.
Another semi-revolutionary change made to Linux 2.4 involves the way it handles disks and partitions. Previous versions of Linux were somewhat limited in the way disks and partitions were used. Like most other operating systems, Linux sat directly on the standard hardware partition scheme. Using fdisk or a similar tool, you would create partitions that would be formatted as “Linux swap” or “ext2”. These partitions are difficult to change once in place, however. Resizing or moving a partition was almost impossible, even for power users.
Previous versions of Linux include the “multiple device” (or “md”) driver, which allows Linux administrators to concatenate partitions and do more complicated maneuvers to create software RAID arrays. Using this driver and some footwork, it is possible to extend the native partition scheme somewhat, but not to the level of flexibility required by many applications.
The Logical Volume Manager has the power to make the Linux world a more flexible place. (Just as with the md driver, other OSes generally won't be able to understand Linux's LVM scheme.) Instead of dividing disks into static partitions, the LVM allows a Linux user to concatenate several physical disk devices into a single “volume group”. These groups can then be partitioned into multiple “logical volumes”. The LVM allows volumes to be resized (with certain constraints) and moved. More disks can be added to volume groups on the fly, allowing for massive storage capacity—the sky is almost literally the limit. The LVM subsystem is not new to the commercial UNIX world; the code is modeled partly from its implementation in Tru64 UNIX, HPUX and other commercial UNIX systems.
As an added bonus of this effort, code to resize ext2 partitions has been released to the public. While this code lives out in user space, far away from the kernel, it is a very important component of the LVM subsystem. Even when used without the LVM subsystem, it will no doubt be used in the next generation of installation programs and fdisk-like utilities.
Linux 2.4 has been through several major and minor stumbles on its way to becoming suitable for very large environments. The hardware is largely to blame for the difficulty. i386 hardware (being 32-bit and somewhat archaic) cannot easily support huge files and other requirements of a 64-bit operating system. This is a limitation of the hardware, and not a limitation with Linux itself. With the growing popularity of NT on i386 hardware, there has been a push to get 64-bit Linux on i386 too—a push to raise the bar where Linux can, to overcome the limitations of the platform without sacrificing the cleanliness and speed of the current implementation. Two major improvements in this area really stand out: more users per system and very large files.
First, one feature literally demanded by enterprise is 32-bit user and group IDs (UID and GID). In the Linux and UNIX worlds, every user is given a unique number. Unfortunately, the numeric system is finite and a limit of 65,000 users is constricting in some applications (i.e., a high-volume web hosting site like Geocities or Tripod or an ISP will have scaling issues when they gain 65,000 users). Linux 2.4 bumps the limit up to about 4.3 billion. For comparison, the population of the world is just over 6 billion. [Almost enough for everybody to have an account on every single computer! —Ed.]
Along with this trend of bigger and better, Linux 2.4 raises another ceiling: maximum file size. Previous incarnations of Linux would choke on files larger than 2GB, despite the fact that the underlying file system could theoretically handle it. Although many people may not immediately see the benefit of having incredibly large files, this feature is especially useful for managing information or media—imagine tarring all your MP3s into one convenient file, or putting them in a database.
It should be stressed that while Linux 2.4 will allow for bigger and better applications, one of the main concerns of the development team has always been to optimize for the common case: real users. To this end, the code has been carefully designed in such a way as to affect real users the absolute least possible. While these features will be assets to the users who need them, they will not be a stumbling block for users who don't.
One benefit of the open-source methodology of development is that it is perfectly reasonable for one to “fork” a copy of the kernel and do their own development on the fork. This method is often employed by kernel hackers who want to plod off and do their own thing for a while without having to pay careful attention to the other developments. As it turns out, however, these changes must eventually be integrated into the “official” kernel to be properly respected and supported. Although forks can be of any type, one common area for Linux to fork-and-merge is its support for processor families.
With the latest revision of the Linux kernel, several new processor families are given their special place within the kernel sources. ia64, one of the most talked-about additions to the kernel in some time, is the future Intel processor that is supposed to be the 64-bit replacement for the i386 line. In a sense, it's sort of like the PPC to Motorola's m68k. Hardware for this platform is virtually nonexistent and is not expected to reach consumers in quantity for years. However, Linux will be there when the first motherboards and processors roll off the assembly line. In many ways, this is a demonstration of Linux's larger role in the operating system market, as all previous Linux ports were done after hardware and an alternative OS were already available.
SuperH is the embedded processor used in Pocket PC (a.k.a. Windows CE) machines and another addition to the “supported by Linux” clan. There's an irony to supporting Linux on WinCE hardware that many users just can't help but chuckle over. Again, this port is still in an early stage, but the developers are chugging along.
About as far on the other end of the computing spectrum as you can get, the S/390 will get a port, too. The S/390 is the latest generation of IBM's mainframe line and probably the largest variety of hardware that Linux is known to run on. Much of the port was done by IBM; this in itself is another first for the Linux community.
While it is true that Linux supports all of these new processors in the official kernel source distribution, these ports are not necessarily ready for prime time. Additionally, pre-compiled distributions for any of these processors may be a long way away from the “normal user” stage.
Although I probably shouldn't admit it, I know a number of system administrators who never bothered upgrading their kernel to Linux 2.2. Even though Linux 2.2 included a completely rewritten network layer, the cost of bothering to update their old (2.0 era) scripts to the 2.2 command set was daunting to some. That being said, Linux 2.4 has rewritten the entire networking layer (again!) and introduced an entirely new interface: iptables. But what about those people who don't want to upgrade again? This time around, Linux 2.4 includes compatibility modules for both the 2.0 and the 2.2-era tools. With compatibility tools lowering the cost of entry, it is hoped that this release of the Linux kernel will be more readily implemented than the previous release.
The Linux 2.4 networking layer wasn't rewritten again for nothing. Network Address Translation (NAT) and Firewall operations have been made more flexible in their operation and split off into separate modules. With these modules, a Linux 2.4 system becomes nearly as powerful and flexible as modern-day commercial routing hardware. Of course, to use the really nifty features of the new kernel, you have to be using the “real” iptables interface and not either of the compatibility interfaces provided.
While the new flexibility may be enough to convince hard-core network people to upgrade, the 2.4 Linux kernel also includes more general fixes and speedups for the networking layer. David Miller and the rest of the networking gurus have been hard at work making sure Linux 2.4 talks more efficiently to other operating systems. The networking layer and the TCP/IP stack have been rewritten to be more scalable on multi-processor machines. Network device drivers are now written to make them more stable and to eliminate some possible race conditions in the infrastructure, too. These changes further build on the great work that was done with Linux networking during the Linux 2.2 development cycle.
Linux 2.2 included the first official support for frame-buffer graphics devices in the kernel; Linux 2.4 also recognizes a new interface for kernel-level control of graphic hardware. With the introduction of Direct Rendering Manager (DRM) comes a system to keep multiple demanding video processes in check. Rather than being a complete video driver in itself (such things are better left in user space), Linux 2.4 makes user-space video more stable (and secure) by providing a kernel interface which controls and synchronizes access to graphical devices. Supported programs, such as Xfree4.x, will talk to this interface whenever a hardware resource is needed. The kernel will know when multiple programs are attempting simultaneous access to video structures, and will save state or do whatever is necessary to make sure they don't conflict. Since supported programs will be unable to send conflicting requests to the graphical hardware, these conflicts will not be able to cause a crash. This new feature is largely geared to advanced accelerated hardware, but lower-end hardware may benefit from the new resource allocation routines as well.
One major area of improvement in Linux 2.4 is the number of device types it supports. I already wrote about Linux 2.4's support for USB, ISA Plug-and-Play and PC Card devices in my previous articles. This picture would not be complete, however, without mention of support for Firewire and I2O (Intelligent Input/Output) devices, two relatively new additions to the PC hardware market.
Firewire, IEEE 1394, is a high-speed external bus system that is similar in concept to USB. (You may also hear it called by Sony's name: i.Link.) Unlike USB, Firewire supports multiple computers on the same bus and at higher-speed transfers than USB. Due to the high bandwidth available, Firewire has proven most useful for digital (video) cameras and similar devices which require a lot of data to be transferred quickly. It should be noted that, although the underlying bus is supported under Linux, not all hardware chip sets and devices are supported yet. This support will improve over time and as more hardware becomes available.
I2O is a new type of I/O subsystem that features operating-system independence in addition to high-speed data transfers. This means that, in theory, one driver is guaranteed to work with all devices of a specific type, regardless of vendor or how the device actually works internally. Unfortunately for us, there are relatively few I2O devices made so far, and the kernel support is still somewhat incomplete.
Although Firewire and I2O are relatively new to the Linux sphere and relatively little hardware actually exists for these bus types, the open-source snowball is rolling and support for these device types will improve as these devices become more common.
Linux 2.4 is shaping up to be the best version of Linux yet... oh, wait, I already used the line for Linux 2.2. With many new features for desktop and enterprise, the new kernel has something for everyone. Bring the kids!
Joseph Pranevich (jpranevich@lycos.com) is an avid Linux geek and, while not working for Lycos, enjoys writing (all kinds) and working with a number of open-source projects.