diff -u

What's New in Kernel Development by Zack Brown

Unit Testing

Brendan Higgins recently proposed adding unit tests to the Linux kernel, supplementing other development infrastructure such as perf, autotest and kselftest. The whole issue of testing is very dear to kernel developers' hearts, because Linux sits at the core of the system and often has a very strong stability/security requirement. Hosts of automated tests regularly churn through kernel source code, reporting any oddities to the mailing list.

Unit tests, Brendan said, specialize in testing standalone code snippets. It was not necessary to run a whole kernel, or even to compile the kernel source tree, in order to perform unit tests. The code to be tested could be completely extracted from the tree and tested independently. Among other benefits, this meant that dozens of unit tests could be performed in less than a second, he explained.

Giving credit where credit was due, Brendan identified JUnit, Python's unittest.mock and Googletest/Googlemock for C++ as the inspirations for this new KUnit testing idea.

Brendan also pointed out that since all code being unit-tested is standalone and has no dependencies, this meant the tests also were deterministic. Unlike on a running Linux system, where any number of pieces of the running system might be responsible for a given problem, unit tests would identify problem code with repeatable certainty.

Daniel Vetter replied extremely enthusiastically to Brendan's work. In particular, he said, "Having proper and standardized infrastructure for kernel unit tests sounds terrific. In other words: I want." He added that he and some others already had been working on a much more specialized set of unit tests for the Direct Rendering Manager (DRM) driver. Brendan's approach, he said, would be much more convenient than his own more localized efforts.

Dan Williams was also very excited about Brendan's work, and he said he had been doing a half-way job of unit tests on the libnvdimm (non-volatile device) project code. He felt Brendan's work was much more general-purpose, and he wanted to convert his own tests to use KUnit.

Tim Bird replied to Brendan's initial email as well, saying he thought unit tests could be useful, but he wanted to make sure the behaviors were correct. In particular, he wanted clarification on just how it was possible to test standalone code. If the code were to be compiled independently, would it then run on the local system? What if the local system had a different hardware architecture from the system for which the code was intended? Also, who would maintain unit tests, and where would the tests live, within the source tree? Would they clutter up the directory being tested, or would they live far away in a special directory reserved for test code? And finally, would test code be easier to write than the code being tested? In other words, could new developers cut their teeth on a project by writing test code, as a gateway to helping work on a given driver or subsystem? Or would unit tests have to be written by people who had total expertise in the area already?

Brendan attempted to address each of those issues in turn. To start, he confirmed that the test code was indeed extracted and compiled on the local system. Eventually, he said, each test would compile into its own completely independent test binary, although for the moment, they were all lumped together into a single user-mode-linux (UML) binary.

In terms of cross-compiling test code for other architectures, Brendan felt this would be hard to maintain and had decided not to support it. Tests would run locally and would not depend on architecture-specific characteristics.

In terms of where the unit tests would live, Brendan said they would be in the same directory as the code being tested. So every directory would have its own set of unit tests readily available and visible. The same person maintaining the code being tested would maintain the tests themselves. The unit tests, essentially, would become an additional element of every project. That maintainer would then presumably require that all patches to that driver or subsystem pass all the unit tests before they could be accepted into the tree.

In terms of who was qualified to write unit tests for a given project, Brendan explained:

In order to write a unit test, the person who writes the test must understand what the code they are testing is supposed to do. To some extent that will probably require someone with some expertise to ensure that the test makes sense, and indeed a change that breaks a test should be accompanied by an update to the test. On the other hand, I think understanding what pre-existing code does and is supposed to do is much easier than writing new code from scratch, and probably doesn't require too much expertise.

Brendan added that unit tests would probably reduce, rather than increase, a maintainer's workload. In spite of representing more code overall:

Code with unit tests is usually cleaner, the tests tell me exactly what the code is supposed to do, and I can run the tests (or ideally have an automated service run the tests) that tell me that the code actually does what the tests say it should. Even when it comes to writing code, I find that writing code with unit tests ends up saving me time.

Overall, Brendan was very pleased by all the positive interest, and said he planned to do additional releases to address the various technical suggestions that came up during the course of discussion. No voices really were raised in opposition to any of Brendan's ideas. It appears that unit tests may soon become a standard part of many drivers and subsystems.

Ditching Out-of-Date Documentation Infrastructure

Long ago, the Linux kernel started using 00-Index files to list the contents of each documentation directory. This was intended to explain what each of those files documented. Henrik Austad recently pointed out that those files have been out of date for a very long time and were probably not used by anyone anymore. This is nothing new. Henrik said in his post that this had been discussed already for years, "and they have since then grown further out of date, so perhaps it is time to just throw them out."

He counted hundreds of instances where the 00-index file was out of date or not present when it should have been. He posted a patch to rip them all unceremoniously out of the kernel.

Joe Perches was very pleased with this. He pointed out that .rst files (the kernel's native documentation format) had largely taken over the original purpose of those 00-index files. He said the oo-index files were even misleading by now.

Jonathan Corbet was more reserved. He felt Henrik should distribute the patch among a wider audience and see if it got any resistance. He added:

I've not yet decided whether I think this is a good idea or not. We certainly don't need those files for stuff that's in the RST doctree, that's what the index.rst files are for. But I suspect some people might complain about losing them for the rest of the content. I do get patches from people updating them, so some folks do indeed look at them.

Henrik told Jonathan he was happy to update the 00-index files if that would be preferable. But he didn't want to do that if the right answer was just to get rid of them.

Meanwhile, Josh Triplett saw no reason to keep the 00-index files around at all. He remarked, "I was *briefly* tempted, reading through the files, to suggest ensuring that the one-line descriptions from the 00-INDEX files end up in the documents themselves, but the more I think about it, I don't think even that is worth anyone's time to do."

Paul Moore also voiced his support for removing the 00-index files, at least the ones for NetLabel, which was his area of interest.

The discussion ended there. It's nice that even for apparently obvious patches, the developers still take the time to consider various perspectives and try to retain any value from the old thing to the new. It's especially nice to see this sort of attention given to documentation patches, which tend to get left out in the cold when it comes to coding projects.

Non-Child Process Exit Notification Support

Daniel Colascione submitted some code to support processes knowing when others have terminated. Normally a process can tell when its own child processes have ended, but not unrelated processes, or at least not trivially. Daniel's patch created a new file in the /proc directory entry for each process—a file called "exithand" that is readable by any other process. If the target process is still running, attempts to read() its exithand file will simply block, forcing the querying process to wait. When the target process ends, the read() operation will complete, and the querying process will thereby know that the target process has ended.

It may not be immediately obvious why such a thing would be useful. After all, non-child processes are by definition unrelated. Why would the kernel want to support them keeping tabs on each other? Daniel gave a concrete example, saying:

Android's lmkd kills processes in order to free memory in response to various memory pressure signals. It's desirable to wait until a killed process actually exits before moving on (if needed) to killing the next process. Since the processes that lmkd kills are not lmkd's children, lmkd currently lacks a way to wait for a process to actually die after being sent SIGKILL.

Daniel explained that on Android, the lmkd process currently would simply keep checking the proc directory for the existence of each process it tried to kill. By implementing this new interface, instead of continually polling the process, lmkd could simply wait until the read() operation completed, thus saving the CPU cycles needed for continuous polling.

And more generally, Daniel said in a later email:

I want to get polling loops out of the system. Polling loops are bad for wakeup attribution, bad for power, bad for priority inheritance, and bad for latency. There's no right answer to the question "How long should I wait before checking $CONDITION again?". If we can have an explicit waitqueue interface to something, we should. Besides, PID polling is vulnerable to PID reuse, whereas this mechanism (just like anything based on struct pid) is immune to it.

Joel Fernandes suggested, as an alternative, using ptrace() to get the process exit notifications, instead of creating a whole new file under /proc. Daniel explained:

Only one process can ptrace a given process at a time, so I don't like ptrace as a mechanism for anything except debugging. Relying on ptrace for exit notification would interfere with things like debuggers and crash dump collection systems. Besides, ptrace can do too much (like read and write process memory) and so requires very strong privileges not necessary for this mechanism. Besides: ptrace's interface is complicated and relies on repeated calls to various wait functions, whereas the interface in this patch is simple enough to use from the shell.

The issue of PID (process ID) reuse came up again, because it wasn't clear to everyone that a whole new file in the /proc directory was the best way to solve the problem. As David Laight said, Linux used a reference counter on all PIDs, so that any reuse could be seen. He figured the /proc directory should include some way to expose that reference count.

Other operating system kernels have other ways of trying to avoid PIT reuse or at least mitigate its downsides. As Joel explained:

If you look at the NetBSD pid allocator you'll see that it uses the low pid bits to index an array and the high bits as a sequence number. The array slots are also reused LIFO, so you always need a significant number of pid allocate/free before a number is reused. The non-sequential allocation also makes it significantly more difficult to predict when a pid will be reused. The table size is doubled when it gets nearly full.

But to this, Daniel replied:

NetBSD is still just papering over the problem. The real issue is that the whole PID-based process API model is unsafe, and a clever PID allocator doesn't address the fundamental race condition. As long as PID reuse is possible at all, there's a potential race condition, and correctness depends on hope. The only way you could address the PID race problem while not changing the Unix process API is by making pid_t ridiculously wide so that it never wraps around.

Elsewhere, Aleksa Sarai was still unconvinced that that a whole new file in the /proc directory would be a good thing, if there were a way to avoid it. Aleksa understood that Daniel wanted to avoid continuous polling, but felt there were still workable alternatives. For example, Aleksa said, "When you open /proc/$pid, you already have a handle for the underlying process, and you can already poll to check whether the process has died (fstatat fails for instance). What if we just used an inotify event to tell userspace that the process has died—to avoid userspace doing a poll loop?"

Daniel replied that Aleksa's solution was far more complicated than Daniel's. He said that inotify and related APIs were:

...intended for broad monitoring of system activity, not for waiting for some specific event. They require a substantial amount of setup code, and since both are event-streaming APIs with buffers that can overflow, both need some logic for userspace to detect buffer overrun and fall back to explicit scanning if that happens. They're also optional part of the kernel.

Daniel went on:

Given that we *can*, cheaply, provide a clean and consistent API to userspace, why would we instead want to inflict some exotic and hard-to-use interface on userspace instead? Asking that userspace poll on a directory file descriptor and, when poll returns, check by looking for certain errors (we'd have to spec which ones) from fstatat is awkward. /proc/pid is a directory. In what other context does the kernel ask userspace to use a directory this way?

The debate went on, with no resolution on the mailing list. Daniel continued to insist that his approach was simpler than any of the proposed alternatives, and he also felt it was in keeping with the spirit of UNIX itself. At one point, he explained:

The basic unix data access model is that a userspace application wants information (e.g., next bunch of bytes in a file, next packet from a socket, next signal from a signal FD, etc.), and tells the kernel so by making a system call on a file descriptor. Ordinarily, the kernel returns to userspace with the requested information when it's available, potentially after blocking until the information is available. Sometimes userspace doesn't want to block, so it adds O_NONBLOCK to the open file mode, and in this mode, the kernel can tell the userspace requestor "try again later", but the source of truth is still that ordinarily-blocking system call. How does userspace know when to try again in the "try again later" case? By using select/poll/epoll/whatever, which suggests a good time for that "try again later" retry, but is not dispositive about it, since that ordinarily-blocking system call is still the sole source of truth, and that poll is allowed to report spurious readabilty. This model works fine and has a ton of mental and technical infrastructure built around it. It's the one the system uses for almost every bit of information useful to an application.

The opposition to Daniel's patch seems to emanate from the desire to avoid adding new files to /proc. There's a real risk of /proc, and other kernel interfaces, growing bloated, overly complex and unmaintainable over time. Linus Torvalds and other top contributors want to avoid this, especially since it is very difficult to remove interfaces once they are implemented. Once user software starts to rely on a given interface, there's a great reluctance in Linux to break that software. One reason for this is that not all software is open source, and older closed-source tools may not be maintained, and thus may not have the option to adapt to any new interface. A change in something they rely on may mean the software simply can't be used with newer kernels. The kernel developers want to avoid that situation if at all possible.

It's unclear whether Daniel's patch will go into the tree in its current form, given the opposition. It may be that user code—the Android OS in this case—for now will have to continue to use other, more complicated ways of knowing when processes have died.

About the Author

Zack Brown is a tech journalist at Linux Journal and Linux Magazine, and is a former author of the "Kernel Traffic" weekly newsletter and the "Learn Plover" stenographic typing tutorials. He first installed Slackware Linux in 1993 on his 386 with 8 megs of RAM and had his mind permanently blown by the Open Source community. He is the inventor of the Crumble pure strategy board game, which you can make yourself with a few pieces of cardboard. He also enjoys writing fiction, attempting animation, reforming Labanotation, designing and sewing his own clothes, learning French and spending time with friends'n'family.

Zack Brown