SIDUS—the Solution for Extreme Deduplication of an Operating System

Emmanuel Quemener

Marianne Corvellec

Issue #235, November 2013

Probe corrupted computers without disassembling anything, and provide users with a full-featured environment in just a few seconds.

SIDUS (Single-Instance Distributing Universal System) was developed at Centre Blaise Pascal (Ecole normale supérieure de Lyon, Lyon, France), where one administrator alone is in charge of 180 stations. Emmanuel Quemener started SIDUS in February 2010, and he significantly cut his workload for administering this park of stations. SIDUS is now in use at the supercomputing centre PSMN (Pôle Scientifique de Modélisation Numérique) of the Ecole normale supérieure de Lyon.

With SIDUS, you can provide a new user with a complete functional environment in just a few seconds. You can probe corrupted computers without disassembling anything. You can test new equipment without installing an OS on them. You can make your life so much easier when managing hundreds of cluster nodes, of workstations or of self-service stations. You drastically can reduce the amount of storage needed for the OS on these machines.

Disclaimer

SIDUS is not LTSP. LTSP is a solution for the simplified management of thin terminals through X11 or RDP access to a server. Thus, all the processing load is on the latter server. On the contrary, SIDUS makes full use (or partial use, as the user wishes) of the station's resources. Only the OS is stored remotely.

SIDUS is not FAI. FAI or Kickstart offer full simplified installs so that administration can be reduced or dismissed altogether. On the contrary, SIDUS offers a single system in a tree that integrates the base system as well as all manually installed applications.

SIDUS is flexible. When organizing IT-training sessions, you might want to give participants a specific virtual environment. But once they download it, you cannot modify it for them. SIDUS offers users a single given environment that is easily configurable at any moment.

SIDUS is not exotic. SIDUS makes use of services available with any distribution (DHCP, PXE, TFTP, NFSroot, DebootStrap and AUFS). You can install SIDUS knowing only these few keywords. Besides, SIDUS makes use of distribution tricks from live CDs. SIDUS works on Debian, all the way from version Etch.

How Good Is SIDUS?

SIDUS is:

  • Universal: platform-independent, x86 or x86_64 architectures.

  • Efficient: installing takes a few minutes, and booting takes a few seconds.

  • Energy-saving: it takes only one core, 1GB of RAM, 40GB in disk space and an Ethernet (Gbit) network.

  • Scalable: tested successfully on a hundred nodes.

  • Multipurpose: we chose to use Debian as it comes with broad integration of open-source scientific software.

Installing SIDUS on Your System

It takes a little preparation for your system to host SIDUS. We have several services at our disposal in order to deploy our clients: DHCP, TFTP and NFS servers. Now, either you are on great terms with your own IT staff, or you are able to access freely the well-defined LDAP and DNS servers:

  • DHCP service provides the client with one IP address but propagates two complementary pieces of information: IP address of the TFTP server (variable “next-server”) and the name of the PXE binary, often called pxelinux.0.

  • TFTP service then comes into play. Booting the system is enabled by TFTP, through the binary pxelinux.0, the kernel and startup of the client's system. If you need to give a client some parameters, you just build a dedicated file whose name stems from the client's MAC address (prefixing with 01 and replacing : with -).

  • NFS service now enters the loop: it gives the system's root via its protocol (NFSroot). Accordingly, you will install your client system in this root—for example, /src/nfsroot/sidus.

In our configuration, we have used isc-dhcp-server, tftpd-hpa and nfs-kernel-server for the servers DHCP, TFTP and NFS, respectively. Let's look into this configuration.

For DHCP, the configuration file (/etc/dhcp/dhcpd.conf) reads:

next-server 172.16.20.251;
filename "pxelinux.0";
allow booting;

For TFTP, there are three files and one directory (pxelinux.cfg) in /srv/tftp:

./pxelinux.0 
./vmlinuz-Sidus
./initrd.img-Sidus
./pxelinux.cfg

The pxelinux.0 file comes from the syslinux-common package. In pxelinux.cfg, there is the file called default.

To boot, you need the following: the kernel vmlinuz-Sidus, the system initrd.img-Sidus and the server NFSroot 10.13.20.13 with the mountpoint /srv/nfsroot/sidus.

Below is an example of a boot file. It takes two inputs: tmpfs and iscsi (we'll come back to the iscsi input later on):

DEFAULT tmpfs

LABEL tmpfs
KERNEL vmlinuz-Sidus
APPEND console=tty1 root=/dev/nfs
    initrd=initrd.img-Sidus
    nfsroot=10.13.20.13:/srv/nfsroot/sidus,
    rsize=8192,wsize=8192,tcp ip=dhcp aufs=tmpfs

LABEL iscsi
KERNEL vmlinuz-Sidus
APPEND console=tty1 root=/dev/nfs
    initrd=initrd.img-Sidus
    nfsroot=10.13.20.13:/srv/nfsroot/wheezy64,
    rsize=8192,wsize=8192,tcp ip=dhcp aufs=iscsi
    ISCSI_TARGET_IP=10.13.20.14
    ISCSI_INITIATOR=iqn.2013-04.zone.sidus.target:
    default root=LABEL=ISCSI

Regarding the NFS server, it takes one line in the file /etc/exports to configure it:

/srv/nfsroot/sidus
10.13.20.0/255.255.255.0(ro,no_subtree_check,async,no_root_squash)

Here, we open a read-only access to stations with IP between 10.13.20.1 and 10.13.20.254.

Once you have configured these three services (DHCP, TFTP and NFS), you can install a full SIDUS. Note that you also will need a root for user accounts (via NFSv4) and a process enabling their identification/authentication (via LDAP or Kerberos). We have deployed SIDUS on environments where these services are provided by third-party servers but also on standalone environments. Installing an OpenLDAP server with SSL or a Kerberos server is off-topic, so we simply show the client configuration files for our infrastructure (again, LDAP for identification/authentication and NFSv4 for user folders).

Install the Debian Base with Debootstrap

With Debootstrap, you can install a system in an extra root location. Debootstrap needs you to specify parameters, such as the install root, the hardware architecture, the distribution and the FTP or HTTP Debian archive to use for downloading on Debian worldwide mirror sites.

Warning: this is where we get Debian-specific. Debootstrap is a familiar tool for all Debian-like distributions (typically, it is available on Ubuntu). It is not too difficult to make it happen on Red Hat-like distributions though. There is a clone for Fedora called febootstrap; we have not tested it though.

Debootstrap (wiki.debian.org/Debootstrap) also takes as input a list of archives—as we all know, Debian is very particular about distinguishing the main archive area from the contrib and non-free ones—a list of packages to include and a list of packages to dismiss. We wish we could specify the latter two lists, but you cannot handle everything with Debootstrap. We install from the very beginning a set of tools we deem necessary (such as the kernel, some firmware and auditing tools).

We define environment variables corresponding to the root of our SIDUS system. We define a command that enables the execution of commands via chroot, with a specific option for package install. The variable $MyInclude corresponds to the (comma-separated) list of packages you want, and $MyExclude corresponds to the list of packages you do not want:

export SIDUS=/srv/nfsroot/sidus
time debootstrap --arch amd64
    --components='main,contrib,non-free'
    --include=$MyInclude --exclude=$MyExclude
    wheezy $SIDUS http://ftp.debian.org/debian

Precautions before Moving on with the Install

After running the last-mentioned command, you should be a little cautious. A Debian package normally starts after install. You need to define a hook to inhibit the booting of services. After completion of the install, you can remove this hook:

printf '#!/bin/sh\nexit 101\n' >
    ${SIDUS}/usr/sbin/policy-rc.d
chmod +x ${SIDUS}/usr/sbin/policy-rc.d

Some packages require access to the list of processes, system, peripherals, peripheral pointers and virtual memory. Hence, you should bind the mounting of these host system folders to SIDUS:

alias sidus="DEBIAN_FRONTEND=noninteractive chroot
    ${SIDUS} $@"
sidus mount -t proc none /proc
sidus mount -t sysfs sys /sys
mount --bind /run/shm ${SIDUS}/run/shm
mount --bind /dev/pts ${SIDUS}/dev/pts

Install Additional Packages (Scientific Libraries)

To make it simpler when installing packages of the same family, Debian ships with meta-packages. In our case, we are interested in the scientific ones: their names are prefixed by “science”. For example, we have “science-chemistry”, including all chemistry packages. You install all scientific packages with only one command:

time sidus apt-get install --install-suggests -f
    -m -y --force-yes science-*

Because we are talking about a full-featured OS, we also install the suggested packages: the option --install-suggests is available from Wheezy onward (released May 5, 2013).

When installing, the costliest phase is downloading packages and configuring certain components (Perl and LaTeX). In the best-case scenario, it takes 45 minutes for a 32GB full tree. There is a price to pay for this install craze. Some packages do not install well, and you will want to purge some, such as a M*tlab installer:

time sidus apt-get purge -y -f --force-yes matlab-*

Local Environment

Usually, you will want to adapt the system to a local environment (authentication and user sharing). The default is US, so you may want to configure:

  • ${SIDUS}/etc/locale.gen.

  • ${SIDUS}/etc/timezone.

  • ${SIDUS}/etc/default/keyboard.

For LDAP authentication, you may want to configure: ${SIDUS}/etc/nsswitch.conf, ${SIDUS}/etc/libpam_ldap.conf, ${SIDUS}/etc/libnss-ldap.conf and ${SIDUS}/etc/ldap/ldap.conf.

As for the mounting of NFS user folders:

  • ${SIDUS}/etc/default/nfs-common, ${SIDUS}/etc/default/idmapd.conf and ${SIDUS}/etc/fstab (for NFSv4).

  • ${SIDUS}/etc/fstab (for NFSv3).

Set Up the Boot Sequence

How do you share SIDUS without duplicating it? We found the best solution to be via live CD. The boot sequence includes the two layers you need—that is, a read-only layer (on media for live CD and on NFS in our case) and a read-write layer (on TMPFS). The two layers are linked via AUFS (the successor of UnionFS). Everything is taken care of by a single hook upon boot (the script called rootaufs). It operates in five steps:

  1. Creates the temporary files /ro, /rw and /aufs.

  2. Moves the root of NFSroot from the original mountpoint to /ro.

  3. Mounts the local or remote partition.

  4. Superimposes /ro and /rw into /aufs.

  5. Moves /aufs into the original mountpoint. rootaufs goes into ${SIDUS}/etc/initramfs-tools/scripts/init-bottom.

The original script is inspired the by rootaufs project by Nicholas A. Schembri (code.google.com/p/rootaufs). We adapted it to a large extent to match our infrastructure. A version is available at www.cbp.ens-lyon.fr/sidus/rootaufs:

wget -O
    ${SIDUS}/etc/initramfs-tools/scripts/init-bottom
    http://www.cbp.ens-lyon.fr/sidus/rootaufs

The system is not functional yet. You need to create an initrd specific to your NFS boot. Add aufs in ${SIDUS}/etc/initramfs-tools/modules and force eth0 as DEVICE in ${SIDUS}/etc/initramfs-tools/initramfs.conf:

sidus update-initramfs -k all -u

Then, you just copy the kernel and bootloader in the definition:

cp ${SIDUS}/vmlinuz /srv/tftp/vmlinux-Sidus
cp ${SIDUS}/srv/nfsroot/boot/initrd
    /srv/tftp/initrd-Sidus

How can you take advantage of SIDUS while keeping a given configuration from one boot to the next? Mounting NFS on each node separately is very costly. It is preferable to mount iSCSI on each node.

Originally, we investigated how to offer a second NFS share in read-write mode to ensure persistence of client-related changes from one boot to the next. This version, although functional, required an atomized NFS—one for each client. This was not sustainable for the server.

Therefore, we decided on another solution to ensure persistence. We create an iSCSI share for each client. The settings for mounting the iSCSI disk are defined in the line command.

So we use a network drive from iSCSI technology. In the config file /srv/tftp/pxelinux.cfg/default, we have the definition LABEL=iscsi. Each SIDUS client needs its own iSCSI storage space to ensure persistence. For reasons of simplicity, in the initrd booting sequence, the SIDUS clients fetch the volumes that bear their respective IPs. The rootaufs file contains a default login/password.

A few tricks:

  • Erase /etc/hostname to set the hostname through DHCP.

  • Set /etc/resolv.conf with a hard-coded definition.

  • Define a loopback in /etc/network/interfaces.

  • Change the booting of GDM3 so it starts only after NSCD is launched.

  • Set /etc/security/limits.conf (essential in an HPC environment).

  • Set /etc/fstab with input from the NFS server of user accounts.

  • For VirtualBox-based virtual systems, install VBoxLinuxAdditions.run in the SIDUS system.

  • For systems with an InfiniBand card, force loading of modules in /etc/modules and regenerate initrd. In /etc/rc.local, execute a script that gets the Ethernet IP address and builds an IP address for the InfiniBand card.

  • For systems with an NVIDIA card: with most NVIDIA cards, packages offered with Debian Wheezy let you install the necessary proprietary drivers and the OpenGL, Cuda and OpenCL libraries. Be careful if you want to use the OpenCL ICD (Installable Client Loader) for AMD to operate your processors and your graphics board simultaneously. To be able to do so, we had to install the entire environment—drivers, Cuda and OpenCL—from scratch.

  • For systems with an AMD ATI card: with most ATI cards, packages offered with Debian Wheezy let you install the necessary proprietary drivers and the OpenGL, Cuda and OpenCL libraries.

At the present time at CBP, we use the technique “NFSroot + iSCSI = AUFS” on SIDUS stations that require persistence, such as DiStoNet nodes (forge.cbp.ens-lyon.fr/projects/distonet). Otherwise, we use “NFSroot + TMPFS = AUFS”.

Install Wrap-up

We unmount all system folders necessary for installation:

umount ${SIDUS}/run/shm
umount ${SIDUS}/dev/pts
sidus umount /proc/sys/fs/binfmt_misc
sidus umount /proc
sidus umount /sys

We activate the startup dæmons:

rm -f ${SIDUS}/usr/bin/policy-rc.d
cp /usr/sbin/start-stop-daemon
    ${SIDUS}/usr/sbin/start-stop-daemon

We remove all process references launched by the install process:

rm -r ${SIDUS}/run/* ${SIDUS}/tmp/*

This purges all processes related to SIDUS.

Adapting to Heterogeneity

The park of computers may consist of cluster nodes (with fast network equipment), workstations (with embedded GPUs) or virtual machines (which require data sharing and GPU acceleration). For a large park, you do not want persistence. You should use boot scripts, a separate SIDUS tree or install third-party components.

Administering the system is not as easy as installing it. The gain you experience in installing the system more than makes up for the pain you experience in administering the system though. With SIDUS, every administration phase abides by the installation mechanisms: protection against booting and mounting of system folders.

Administration techniques are similar to those for the initial install. We use a script so that commands executed in SIDUS are surrounded by pre/post operations. We use this script either automatically (typically for updates) or manually. At the end of the day, these additional commands represent a negligible burden with respect to the benefits we get. We now have several (many) stations that are bit-for-bit identical to a given base system. Other benefits:

  • SIDUS works on user stations. The individual workstations can be considered shared. We started with a dozen Neoware light clients that were memory-enhanced and overclocked. We now have about 20 of those.

  • SIDUS works on cluster nodes. In March 2010, we had a proof of concept with 24 nodes. Nowadays, SIDUS serves 86 permanent nodes over four different hardware architectures.

  • SIDUS works on virtual stations. Every year since 2011, Université Joseph Fourier organizes a summer school on scientific computing. For ten busy days, students train hands-on. Thus, it's necessary to offer them a homogeneous environment in no time. Two virtual images are offered: a persistent one they can use after the summer school and another one via SIDUS. This way, teachers can adapt materials and activities day by day. Since summer 2012, this solution has been used at the Laboratory of Chemistry of ENS de Lyon as well.

  • SIDUS works on suspicious stations. Booting via the network enables the investigation of the shutdown system mass storage. There is no need for a live CD—always short of your ideal forensic tool.

  • SIDUS works on loan stations. Hardware manufacturers usually offer assessment equipment. The install phase can be tedious on recent equipment. Using SIDUS, the system boots just like on other (in use) equipment—for example, it takes a few minutes for 20 nodes.

Conclusion

Who should use SIDUS and why?

  • Users: you choose the resources on which you want to boot your station. Therefore, workstations can be segmented at will. The VirtualBox version of SIDUS has been tested successfully on Linux, MS Windows and Mac OS. GPU acceleration and sharing with the host are available. Users find themselves in the same environment as the nodes'. This makes code integration tremendously easier. Performance-wise, losses due to virtualization vary between 10% and 20% for VirtualBox and about 5% for KVM.

  • Administrators: a given operation propagates to the entire infrastructure, as if simply syncing over the SIDUS tree. The install takes a few tens of minutes for a full-featured system. To work out minor differences between systems, simple scripts or puppets do the job. In the case of more important differences, just build another SIDUS tree. The SIDUS tree might even just be cloned instantaneously using snapshot tools (LVM or, better, ZFSonLinux).

  • For the sake of experiments: the SIDUS environment offers scientists and system engineers a framework for conducting reproducible experiments. Two nodes booting on the same SIDUS base do run the exact same system. This way, even if the stations are not actually identical, relevant tests still can be carried out.

How much does it cost in terms of resources? To get an idea, the clusters' server (also gateway) at CBP hosts DHCP, DNS, TFTP and NFS services, as well as a batch server OAR. When booting the entire infrastructure (88 nodes), the NFS server takes in 900Mb/s.

To conclude, you will want to use SIDUS on a variety of environments, be they HPC nodes, workstations or virtual machines. SIDUS gives unprecedented flexibility to both users and administrators. It is so energy-efficient and it propagates so rapidly, you won't want to live without it!

Emmanuel Quemener defines his job as an “IT test pilot”. His work at the HPC “Centre Blaise Pascal” (Lyon, France) involves software integration, storage, scientific computing with GPUs and technology transfer in science.

Marianne Corvellec is a physicist who gratefully was exposed to the wonderful world of free/libre and open-source software. She wants to make computation science a better place, promoting best practices and interacting with software experts.