Jonathan Corbet wrote a document for inclusion in the kernel tree, describing best practices for merging and rebasing git-based kernel repositories. As he put it, it represented workflows that were actually in current use, and it was a living document that hopefully would be added to and corrected over time.
The inspiration for the document came from noticing how frequently Linus Torvalds was unhappy with how other people—typically subsystem maintainers—handled their git trees.
It's interesting to note that before Linus wrote the git tool, branching and merging was virtually unheard of in the Open Source world. In CVS, it was a nightmare horror of leechcraft and broken magic. Other tools were not much better. One of the primary motivations behind git—aside from blazing speed—was, in fact, to make branching and merging trivial operations—and so they have become.
One of the offshoots of branching and merging, Jonathan wrote, was rebasing—altering the patch history of a local repository. The benefits of rebasing are fantastic. They can make a repository history cleaner and clearer, which in turn can make it easier to track down the patches that introduced a given bug. So rebasing has a direct value to the development process.
On the other hand, used poorly, rebasing can make a big mess. For example, suppose you rebase a repository that has already been merged with another, and then merge them again—insane soul death.
So Jonathan explained some good rules of thumb. Never rebase a repository that's already been shared. Never rebase patches that come from someone else's repository. And in general, simply never rebase—unless there's a genuine reason.
Since rebasing changes the history of patches, it relies on a new "base" version, from which the later patches diverge. Jonathan recommended choosing a base version that was generally thought to be more stable rather than less—a new version or a release candidate, for example, rather than just an arbitrary patch during regular development.
Jonathan also recommended, for any rebase, treating all the rebased patches as new code, and testing them thoroughly, even if they had been tested already prior to the rebase.
"If", he said, "rebasing is limited to private trees, commits are based on a well-known starting point, and they are well tested, the potential for trouble is low."
Moving on to merging, Jonathan pointed out that nearly 9% of all kernel commits were merges. There were more than 1,000 merge requests in the 5.1 development cycle alone.
He pointed out in the doc that, although "many projects require that branches in pull requests be based on the current trunk so that no merge commits appear in the history", the kernel had no such requirement. Merges were considered perfectly orderly ways of doing business, and developers should not try to rebase their branches to avoid merges.
An interesting thing about kernel development is that the hierarchy of maintainership tends to favor a hierarchy of git repository maintainers. It's not uncommon for one or a few people to manage a branched kernel repository, and to have developers managing branches of that tree, with other developers in turn managing branches of those.
For mid-level maintainers, Jonathan pointed out, there are two relevant situations: merging a tree from lower in the hierarchy into your own and merging your own tree higher up toward Linus' top-level tree.
Jonathan recommended that for mid-level maintainers accepting merges from lower trees, maintainers not seek to hide the merge request and, in fact, should add a commit message or changelog entry, explaining the patches that went into the merge.
Jonathan also pointed out that the "Signed-Off-By" tags were crucial elements of commit messages that helped track responsibility as well as important debugging information. He suggested that all maintainers should continue to use them and to verify them when merging from other trees. Jonathan said, "Failure to do so threatens the security of the development process as a whole."
That advice referred to downstream trees, but Jonathan had some very interesting points to make about merging from upstream trees. This is when you're working on your tree, and you want to make sure you're up to date with the latest-and-greatest tree from Linus or someone close to him. Of course, doing so would make your own life slightly easier, because you'd be up to date, you could test your code against the tip of the tree, and so on. Still, Jonathan counseled against it.
For one thing, you could be bringing other people's bugs into your own tree, destabilizing your test code, and then you'd have the uncertainty of knowing that your code was actually solid and ready to submit further upstream.
Another temptation is to do a merge from the upstream source right before submitting your own merge request to ensure that your request won't encounter any conflicts. However, as Jonathan said, "Linus is adamant that he would much rather see merge conflicts than unnecessary back merges. Seeing the conflicts lets him know where potential problem areas are. He does a lot of merges (382 in the 5.1 development cycle) and has gotten quite good at conflict resolution—often better than the developers involved."
Instead, if you do notice a conflict that will show up when Linus does the merge, you should say something about it in the pull request, so Linus sees that you see the situation.
As a last resort, for particularly nutty cases, Jonathan said, you could create another branch, with your own conflict resolutions, and point Linus to that so he can see how you'd resolve the problems yourself. The pull request, however, should be for the unresolved branch.
Doing a test merge in that way is fine, he said. It helps you know if there will be any conflicts, so you can communicate better to the upstream maintainers.
He offered some more good advice and closed by saying:
The guidelines laid out above are just that: guidelines. There will always be situations that call out for a different solution, and these guidelines should not prevent developers from doing the right thing when the need arises. But one should always think about whether the need has truly arisen and be prepared to explain why something abnormal needs to be done.
And...Linus replied that he liked the whole doc.
David Rientjes from Google reported that he actually had been in the process of writing an internal doc for use by Google engineers, discussing this very topic. He was thrilled that Jonathan had done a better job explaining it than his own effort.
Geert Uytterhoeven also liked the new doc, and he offered some spelling and grammar corrections.
Only Theodore Ts'o had any significant criticism to offer. He felt a clear distinction should be made between reordering patches (which he felt was what most people thought of when they talked about rebasing), versus actually changing or removing commits that have already gone into the tree. Both were technically rebasing, yet both were really quite different operations.
Jonathan replied to this, suggesting that maybe the doc could refer separately to "rebasing" and "history modification". And, Ted agreed this would be better.
End of thread. I love seeing this sort of documentation go into the kernel. It's not the same as general-purpose git advice, because it's specific to kernel development processes and policies that are already in place. At the same time, it's potentially very useful to other large-scale projects that might want to mimic the Linux kernel development process. All open-source projects essentially mimic the kernel development process anyway—Linus is the one who first discovered and popularized the methods of how to run an open-source project—and there tends to be a lot of wisdom in his development policy decisions even now.
David Howells put in quite a bit of work on a script, ./scripts/syscall-manage.pl, to simplify the entire process of changing the system call tables. With this script, it was a simple matter to add, remove, rename or renumber any system call you liked. The script also would resolve git conflicts, in the event that two repositories renumbered the system calls in conflicting ways.
Why did David need to write this patch? Why weren't system calls already fairly easy to manage? When you make a system call, you add it to a master list, and then you add it to the system call "tables", which is where the running kernel looks up which kernel function corresponds to which system call number. Kernel developers need to make sure system calls are represented in all relevant spots in the source tree. Renaming, renumbering and making other changes to system calls involves a lot of fiddly little details. David's script simply would do everything right—end of story no problemo hasta la vista.
Arnd Bergmann remarked, "Ah, fun. You had already threatened to add that script in the past. The implementation of course looks fine, I was just hoping we could instead eliminate the need for it first." But, bowing to necessity, Arnd offered some technical suggestions for improvements to the patch.
However, Linus Torvalds swooped in at this particular moment, saying:
Ugh, I hate it.
I'm sure the script is all kinds of clever and useful, but I really think the solution is not this kind of helper script, but simply that we should work at not having each architecture add new system calls individually in the first place.
IOW, we should look at having just one unified table for new system call numbers, and aim for the per-architecture ones to be for "legacy numbering".
Maybe that won't happen, but in the _hope_ that it happens, I really would prefer that people not work at making scripts for the current nasty situation.
And the portcullis came crashing down.
It's interesting that, instead of accepting this relatively obvious improvement to the existing situation, Linus would rather leave it broken and ugly, so that someone someday somewhere might be motivated to do the harder-yet-better fix. And, it's all the more interesting given how extreme the current problem is. Without actually being broken, the situation requires developers to put in a tremendous amount of care and effort into something that David's script could make trivial and easy. Even for such an obviously "good" patch, Linus gives thought to the policy and cultural implications, and the future motivations of other people working in that region of code.
A minor fix, but an interesting exchange: Thierry Reding posted a patch to the core driver code to eliminate an unnecessary warning so users wouldn't get confused and think it was important.
It all started one day long ago and far away with the probe()
function. The kernel generally calls probe()
to trigger a device initialization and get some basic information about it for use by kernel operations. This generally happens very early in the boot process, as soon as the device comes online—or later, if it's a hotplug device.
But some drivers, Thierry pointed out, had to defer the relevant kernel probe()
call until the resources needed by that driver had been initialized. If they didn't get initialized—if they were a hotplug device, for example—then the driver that depends on them might need to defer the probe()
call indefinitely. Thierry remarked:
One example of this can be seen on Tegra, where the DPAUX hardware contains pinmuxing controls for pins that it shares with an I2C controller. The I2C controller is typically used for communication with a monitor over HDMI (DDC). However, other instances of the I2C controller are used to access system critical components, such as a PMIC. The I2C controller driver will therefore usually be a built-in driver, whereas the DPAUX driver is part of the display driver that is loaded from a module to avoid bloating the kernel image with all of the DRM/KMS subsystem.
In this particular case the pins used by this I2C/DDC controller become accessible very late in the boot process. However, since the controller is only used in conjunction with display, that's not an issue.
In other words, deferring probe()
in this case is perfectly fine, for as long as it takes for DPAUX actually to come up. The delay should be considered a regular part of normal kernel operation. As Thierry went on to say, "unfortunately the driver core currently outputs a warning message when a device fails to get the pinctrl before the end of the init stage. That can be confusing for the user because it may sound like an unwanted error occurred, whereas it's really an expected and harmless situation."
Thierry's patch added a flag to the driver_deferred_probe_check_state()
helper function to let callers indicate they want to continue to defer probe()
.
Rob Herring liked the patch, and Rafael J. Wysocki offered some constructive technical criticism.
Greg Kroah-Hartman, on the other hand, was disgruntled.
He and Thierry had apparently had a discussion on this topic before, because he accused Thierry of not following his requirements. Specifically, Greg had said that Thierry should not use "odd flags". He said an earlier version of the patch had used a boolean flag, and now it used a bitmap. To which he remarked, "That did not make the api any easier to understand at all." And Greg concluded, "Anyway, this isn't ok, do it correctly please, like I asked for the first time."
Thierry was unfazed by this rebuke, and he pointed out that Greg had really said "no boolean flag", and Thierry had diligently replaced the boolean flag with a bitmap.
Thierry went on, "To avoid further back and forth, what exactly is it that you would have me do? That is, what do you consider to be the correct way to do this?"
He offered to avoid using flags of any kind and simply rely on function return values. And, Rafael proposed a set of changes that might accomplish this. The main point of Rafael's suggestion is that very clearly named functions would check the state of a given driver and return a very clear value indicating that yes, indeed, the driver would continue to defer the probe()
call. The idea being that now, no longer might the error message confuse people at runtime, but the code itself would not confuse people at development time.
Greg took a look at Rafael's suggestion and replied, "Yes, that's much more sane. Self-describing apis are the key here, I did not want a boolean flag, or any other flag, as part of the public api as they do not describe what the call does at all."
And that was the end of the thread. Presumably, Thierry doesn't mind the new direction, as long as his itch gets scratched, and the unnecessary warning no longer appears.
The interesting thing about this exchange is that Greg's initial requirement was vague, or at least ambiguous, and Thierry just barrelled ahead, adhering to the letter of it without guessing at the deeper significance (clear code). Then finally when Greg got steamed up about it, the opportunity arose to get some clarification from him. I'm not entirely sure Thierry didn't implement his patch specifically to draw out that clarification. Anyway, it did the trick, and the next incarnation of the patch almost certainly will go straight into the tree.
Note: if you're mentioned in this article and want to send a response, please send a message with your response text to ljeditor@linuxjournal.com, and we'll run it in the next Letters section and post it on the website as an addendum to the original article.