Jailhouse

Valentine Sinitsyn

Issue #252, April 2015

A new approach to real-time security-wise virtualization in Linux.

Because you're a reader of Linux Journal, you probably already know that Linux has a rich virtualization ecosystem. KVM is the de facto standard, and VirtualBox is widely used for desktop virtualization. Veterans should remember Xen (it's still in a good shape, by the way), and there is also VMware (which isn't free but runs on Linux as well). Plus, there are many lesser-known hypervisors like the educational lguest or hobbyist Xvisor. In such a crowded landscape, is there a place for a newcomer?

There likely is not much sense in creating yet another Linux-based “versatile” hypervisor (other than doing it just for fun, you know). But, there are some specific use cases that general-purpose solutions just don't address quite well. One such area is real-time virtualization, which is frequently used in industrial automation, medicine, telecommunications and high-performance computing. In these applications, dedicating a whole CPU or its core to the software that runs bare metal (with no underlying OS) is a way to meet strict deadline requirements. Although it is possible to pin a KVM instance to the processor core and pass through PCI devices to guests, tests show the worst-case latency may be above some realistic requirements (see Resources).

As usual with free software, the situation is getting better with time, but there is one other thing—security. Sensitive software systems go through rigorous certifications (like Common Criteria) or even formal verification procedures. If you want them to run virtualized (say, for consolidation purposes), the hypervisor must isolate them from non-certifiable workloads. This implies that the hypervisor itself must be small enough; otherwise, it may end up being larger (and more “suspicious”) than the software it segregates, thus devastating the whole idea of isolation.

So, it looks like there is some room for a lightweight (for the real-time camp), small and simple (for security folks) open-source Linux-friendly hypervisor for real-time and certifiable workloads. That's where Jailhouse comes into play.

New Guy on the Block

Jailhouse was born at Siemens and has been developed as a free software project (GPLv2) since November 2013. Last August, Jailhouse 0.1 was released to the general public. Jailhouse is rather young and more of a research project than a ready-to-use tool at this point, but now is a good time to become acquainted it and be prepared to meet it in production.

From the technical point of view, Jailhouse is a static partitioning hypervisor that runs bare metal but cooperates closely with Linux. This means Jailhouse doesn't emulate resources you don't have. It just splits your hardware into isolated compartments called “cells” that are wholly dedicated to guest software programs called “inmates”. One of these cells runs the Linux OS and is known as the “root cell”. Other cells borrow CPUs and devices from the root cell as they are created (Figure 1).

Figure 1. A visualization of Linux running-bare metal (a) and under the Jailhouse hypervisor (b) alongside a real-time application. (Image from Yulia Sinitsyna; Tux image from Larry Ewing.)

Besides Linux, Jailhouse supports bare-metal applications, but it can't run general-purpose OSes (like Windows or FreeBSD) unmodified. As mentioned, there are plenty of other options if you need that. One day Jailhouse also may support running KVM in the root cell, thus delivering the best of both worlds.

As mentioned previously, Jailhouse cooperates closely with Linux and relies on it for hardware bootstrapping, hypervisor launch and doing management tasks (like creating new cells). Bootstrapping is really essential here, as it is a rather complex task for modern computers, and implementing it within Jailhouse would make it much more complex. That being said, Jailhouse doesn't meld with the kernel as KVM (which is a kernel module) does. It is loaded as a firmware image (the same way Wi-Fi adapters load their firmware blobs) and resides in a dedicated memory region that you should reserve at Linux boot time. Jailhouse's kernel module (jailhouse.ko, also called “driver”) loads the firmware and creates /dev/jailhouse device, which the Jailhouse userspace tool uses, but it doesn't contain any hypervisor logic.

Jailhouse is an example of Asynchronous Multiprocessing (AMP) architecture. Compared to traditional Symmetric Multiprocessing (SMP) systems, CPU cores in Jailhouse are not treated equally. Cores 0 and 1 may run Linux and have access to a SATA hard drive, while core 2 runs a bare-metal application that has access only to a serial port. As most computers Jailhouse can run on have shared L2/L3 caches, this means there is a possibility for cache thrashing. To understand why this happens, consider that Jailhouse maps the same guest physical memory address (GPA) to a different host (or real) physical address for different inmates. If two inmates occasionally have the same GPA (naturally containing diverse data) in the same L2/L3 cache line due to cache associativity, they will interfere with each other's work and degrade the performance. This effect is yet to be measured, and Jailhouse currently has no dedicated means to mitigate it. However, there is a hope that for many applications, this performance loss won't be crucial.

Now that you have enough background to understand what Jailhouse is (and what it isn't), I hope you are interested in learning more. Let's see how to install and run it on your system.

Building Jailhouse

Despite having a 0.1 release now, Jailhouse still is a young project that is being developed at a quick pace. You are unlikely to find it in your distribution's repositories for the same reasons, so the preferred way to get Jailhouse is to build it from Git.

To run Jailhouse, you'll need a recent multicore VT-x-enabled Intel x86 64-bit CPU and a motherboard with VT-d support. By the time you read this article, 64-bit AMD CPUs and even ARM (v7 or better) could be supported as well. The code is already here (see Resources), but it's not integrated into the mainline yet. At least 1GB of RAM is recommended, and even more is needed for the nested setup I discuss below. On the software side, you'll need the usual developer tools (make, GCC, Git) and headers for your Linux kernel.

Running Jailhouse on real hardware isn't straightforward at this time, so if you just want to play with it, there is a better alternative. Given that you meet CPU requirements, the hypervisor should run well under KVM/QEMU. This is known as a nested setup. Jailhouse relies on some bleeding-edge features, so you'll need at least Linux 3.17 and QEMU 2.1 for everything to work smoothly. Unless you are on a rolling release distribution, this could be a problem, so you may want to compile these tools yourself. See the Getting Up to Date sidebar for more information, and I suggest you have a look at it even if you are lucky enough to have the required versions pre-packaged. Jailhouse evolves and may need yet unreleased features and fixes by the time you read this.

Make sure you have nested mode enabled in KVM. Both kvm-intel and kvm-amd kernel modules accept the nested=1 parameter, which is responsible just for that. You can set it manually, on the modprobe command line (don't forget to unload the previous module's instance first). Alternatively, add options kvm-intel nested=1 (or the similar kvm-amd line) to a new file under /etc/modprobe.d.

You also should reserve memory for Jailhouse and the inmates. To do this, simply add memmap=66M$0x3b000000 to the kernel command line. For one-time usage, do this from the GRUB menu (press e, edit the command line and then press F10). To make the change persistent, edit the GRUB_CMDLINE_LINUX variable in /etc/default/grub on the QEMU guest side and regenerate the configuration with grub-mkconfig.

Now, make a JeOS edition of your favorite distribution. You can produce one with SUSE Studio, ubuntu-vm-builder and similar, or just install a minimal system the ordinary way yourself. It is recommended to have the same kernel on the host and inside QEMU. Now, run the virtual machine as (Intel CPU assumed):

qemu-system-x86_64 -machine q35 -m 1G -enable-kvm -smp 4 
  ↪-cpu kvm64,-kvm_pv_eoi,-kvm_steal_time,-kvm_asyncpf,
↪-kvmclock,+vmx,+x2apic -drive 
 ↪file=LinuxInstallation.img,id=disk,if=none 
 ↪-virtfs local,path=/path/to/jailhouse,
↪security_model=passthrough,mount_tag=host 
 ↪-device ide-hd,drive=disk -serial stdio 
 ↪-serial file:com2.txt

Note, I enabled 9p (-virtfs) to access the host filesystem from the QEMU guest side; /path/to/jailhouse is where you are going to compile Jailhouse now. cd to this directory and run:

git clone git@github.com:siemens/jailhouse.git jailhouse
cd jailhouse
make

Now, switch to the guest and mount the 9p filesystem (for example, with mount -t 9p host /mnt). Then, cd to /mnt/jailhouse and execute:

sudo make firmware_install
sudo insmod jailhouse.ko

This copies the Jailhouse binary image you've built to /lib/firmware and inserts the Jailhouse driver module. Now you can enable Jailhouse with:

sudo tools/jailhouse enable configs/qemu-vm.cell

As the command returns, type dmesg | tail. If you see “The Jailhouse is opening.” message, you've successfully launched the hypervisor, and your Linux guest now runs under Jailhouse (which itself runs under KVM/QEMU). If you get an error, it is an indication that your CPU is missing some required feature. If the guest hangs, this is most likely because your host kernel or QEMU are not up to date enough for Jailhouse, or something is wrong with qemu-vm cell config. Jailhouse sends all its messages to the serial port, and QEMU simply prints them to the terminal where it was started (Figure 2). Look at the messages to see what resource (I/O port, memory and so on) caused the problem, and read on for the details of Jailhouse configuration.

Figure 2. A typical configuration issue: Jailhouse traps “prohibited” operation from the root cell.

Configs and Inmates

Creating Jailhouse configuration files isn't straightforward. As the code base must be kept small, most of the logic that takes place automatically in other hypervisors must be done manually here (albeit with some help from the tools that come with Jailhouse). Compared to libvirt or VirtualBox XML, Jailhouse configuration files are very detailed and rather low-level. The configuration currently is expressed in the form of plain C files (found under configs/ in the sources) compiled into raw binaries; however, another format (like DeviceTree) could be used in future versions.

Most of the time, you wouldn't need to create a cell config from scratch, unless you authored a whole new inmate or want the hypervisor to run on your specific hardware (see the Jailhouse for Real sidebar).

Cell configuration files contain information like hypervisor base address (it should be within the area you reserved with memmap= earlier), a mask of CPUs assigned to the cell (for root cells, it's 0xff or all CPUs in the system), the list of memory regions and the permissions this cell has to them, I/O ports bitmap (0 marks a port as cell-accessible) and the list of PCI devices.

Each Jailhouse cell has its own config file, so you'll have one config for the root cell describing the platform Jailhouse executes on (like qemu-vm.c, as you saw above) and several others for each running cell. It's possible for inmates to share one config file (and thus one cell), but then only one of these inmates will be active at a given time.

In order to launch an inmate, you need to create its cell first:

sudo tools/jailhouse cell create configs/apic-demo.cell

apic-demo.cell is the cell configuration file that comes with Jailhouse (I also assume you still use the QEMU setup described earlier). This cell doesn't use any PCI devices, but in more complex cases, it is recommended to unload Linux drivers before moving devices to the cell with this command.

Now, the inmate image can be loaded into memory:

sudo tools/jailhouse cell load apic-demo 
 ↪inmates/demos/x86/apic-demo.bin -a 0xf0000

Jailhouse treats all inmates as opaque binaries, and although it provides a small framework to develop them faster, the only thing it needs to know about the inmate image is its base address. Jailhouse expects an inmate entry point at 0xffff0 (which is different from the x86 reset vector). apic-demo.bin is a standard demo inmate that comes with Jailhouse, and the inmate's framework linker script ensures that if the binary is mapped at 0xf0000, the entry point will be at the right address. apic-demo is just a name; it can be almost anything you want.

Finally, start the cell with:

sudo tools/jailhouse cell start apic-demo

Now, switch back to the terminal from which you run QEMU. You'll see that lines like this are being sent to the serial port:

Calibrated APIC frequency: 1000008 kHz
Timer fired, jitter:  38400 ns, min:  38400 ns, max:  38400 ns
...

apic-demo is purely a demonstrational inmate. It programs the APIC timer (found on each contemporary CPU's core) to fire at 10Hz and measures the actual time between the events happening. Jitter is the difference between the expected and actual time (the latency), and the smaller it is, the less visible (in terms of performance) the hypervisor is. Although this test isn't quite comprehensive, it is important, as Jailhouse targets real-time inmates and needs to be as lightweight as possible.

Jailhouse also provides some means for getting cell statistics. At the most basic level, there is the sysfs interface under /sys/devices/jailhouse. Several tools exist that pretty-print this data. For instance, you can list cells currently on the system with:

sudo tools/jailhouse cell list

The result is shown in Figure 3. “IMB-A180” is the root cell's name. Other cells also are listed, along with their current states and CPUs assigned. The “Failed CPUs” column contains CPU cores that triggered some fatal error (like accessing an unavailable port or unassigned memory region) and were stopped.

Figure 3. Jailhouse cell listing—the same information is available through the sysfs interface.

For more detailed statistics, run:

sudo tools/jailhouse cell stat apic-demo

You'll see something akin to Figure 4. The data is updated periodically (as with the top utility) and contains various low-level counters like the number of hypercalls issued or I/O port accesses emulated. The lifetime total and per-second values are given for each entry. It's mainly for developers, but higher numbers mean the inmate causes hypervisor involvement more often, thus degrading the performance. Ideally, these should be close to zero, as jitter in apic-demo. To exit the tool, press Q.

Figure 4. Jailhouse cell statistics give an insight into how cells communicate with the hypervisor.

Tearing It Down

Jailhouse comes with several demo inmates, not only apic-demo. Let's try something different. Stop the inmate with:

sudo tools/jailhouse cell destroy apic-demo
JAILHOUSE_CELL_DESTROY: Operation not permitted

What's the reason for this? Remember the apic-demo cell had the “running/locked” state in the cell list. Jailhouse introduces a locked state to prevent changes to the configuration. A cell that locks the hypervisor is essentially more important than the root one (think of it as doing some critical job at a power plant while Linux is mostly for management purposes on that system). Luckily, apic-demo is a toy inmate, and it unlocks Jailhouse after the first shutdown attempt, so the second one should succeed. Execute the above command one more time, and apic-demo should disappear from the cell listing.

Now, create tiny-demo cell (which is originally for tiny-demo.bin, also from the Jailhouse demo inmates set), and load 32-bit-demo.bin into it the usual way:

sudo tools/jailhouse cell create configs/tiny-demo.cell
sudo tools/jailhouse cell load tiny-demo 
 ↪inmates/demos/x86/32-bit-demo.bin -a 0xf0000
sudo tools/jailhouse cell start tiny-demo

Look at com2.txt in the host (the same directory you started QEMU from). Not only does this show that cells can be re-used by the inmates provided that they have compatible resource requirements, it also proves that Jailhouse can run 32-bit inmates (the hypervisor itself and the root cell always run in 64-bit mode).

When you are done with Jailhouse, you can disable it with:

sudo tools/jailhouse disable

For this to succeed, there must be no cells in “running/locked” state.

This is the end of our short trip to the Jailhouse. I hope you enjoyed your stay. For now, Jailhouse is not a ready-to-consume product, so you may not see an immediate use of it. However, it's actively developed and somewhat unique to the Linux ecosystem, and if you have a need for real-time application virtualization, it makes sense to keep a close eye on its progress.

Valentine Sinitsyn is a Jailhouse contributor. He has followed this project since day one, and he now works on implementing AMD systems support in the hypervisor.