Backports and long-term stable kernels

Ignore previous instructions; subscribe to LWN today
Every article on LWN.net is written by humans, for humans. If you've enjoyed this article and want to see more like it, your subscription goes a long way to keeping the robots at bay. We are offering a free one-month trial subscription (no credit card required) to get you started.

One of the longest running debates in the kernel community has to do with the backporting of patches from newer kernels to older ones. Substantial effort goes into these backports, with the resulting kernels appearing in everything from enterprise distributions to mobile devices. A recent resurgence of this debate on the Kernel Summit discussion list led to no new conclusions, but it does show how the debate has shifted over time.

Anybody wanting to use the kernel in a production setting tends to be torn between two conflicting needs. On one hand, the desire for stability and a lack of surprises argues for the use of an older kernel that has been maintained under a fixes-only policy for some time. But such kernels tend to lack features and hardware support; one needs to run a recent kernel for those. The answer that results from that conflict is backporting: pick a long-term support (LTS) stable kernel as a base, then port back the few things from newer kernels that one simply cannot do without. With luck, this process will produce a kernel that is the best of both worlds. In practice, the results are rather more mixed.

The problems with backporting are numerous. It will never be possible to pick out all of the important patches to port, so these kernels will always lack important fixes, some of which could leave open severe security holes. The mainline kernel benefits from widespread testing, but a backported kernel is a unique beast that certainly contains bugs introduced by the backporting process itself. The effort that goes into the creation of backport-heavy kernels is unavailable for the task of getting vendor changes upstream, with costs to both the vendor and the community as a whole. Users of highly divergent kernels are dependent on their vendor for support and updates; the community lacks the ability to help them. And so on.

Backports in the embedded industry

Alex Shi started the discussion with a mention of Linaro's LSK backporting effort and asking if there were ways that groups doing this sort of backporting could collaborate. The development community wasn't much interested in discussing backporting collaboration, though; the conversation turned quickly to the value of backporting efforts in general instead. Sasha Levin got there first with a statement that "what LSK does is just crazy" and suggesting that, if vendors want the latest features and fixes, the best way to get them is to run mainline kernels. He was not alone in this opinion.

James Bottomley pointed out that the backporting happening in the embedded industry looks a lot like what the enterprise vendors did in the 2.4 kernel era. They ended up with patch sets that were, in some cases, larger than the kernel itself and were a nightmare to maintain. To get away from these issues, the kernel's development model was changed in 2.6 and the distributors focused on getting features upstream prior to shipping them. That has greatly reduced the load of patches they have to carry, allowed them to run newer kernels, and reduced fragmentation in the kernel community. Why, he asked, can't embedded vendors do the same?

From the discussion, it would seem that, while there are many reasons cited for shipping backported kernels, there is one overwhelming issue that keeps vendors stuck on that model: out-of-tree code. A typical kernel found in an embedded or consumer electronics setting has a vast pile of patches applied, the bulk of which have never made their way into a mainline kernel. Every time that a vendor moves to a new base kernel, this out-of-tree code, perhaps millions of lines of it, must be forward-ported. That is a huge effort with risks of its own. It is unsurprising that vendors will tend to delay doing that work as long as possible; if an older kernel can support a new device through the addition of a few more backported drivers, that is what they will do.

The longstanding "upstream first" policy says that these vendors should have pushed their code into the mainline before shipping it in their devices; then they would have no forward-porting issues when moving to a newer kernel. But "upstream first" has never been the rule in this part of the market. These products are designed, built, shipped, and obsoleted on an accelerated schedule; there is no space in that schedule for the process of getting code upstream, even if the process goes quickly — which is not always the case. Upstreaming after shipping can be done, but the vendor's attention has probably moved on to the next product and the business case for getting the code upstream is not always clear. Five or ten years in, when vendors find themselves struggling under millions of lines of out-of-tree code, they might have cause to wish they had worked more with the development community, but the people making these decisions rarely look that far ahead.

There was some talk about how the out-of-tree code problem could be addressed, but with few new solutions. As Linus Walleij noted, the only reliable solution seems to be customers demanding upstream support from their suppliers. He suggested that if Google would ever make such a requirement for its Nexus Android devices, "then things will happen". Until then, the best that can be done is to continue to talk to and pressure companies and help them to change slowly. Some of this pressure could yet take the form of changes in how stable kernels are managed.

How stable kernels fit in

While much of the conversation talked about the evils of backporting, another branch was focused on the role that stable kernels play in that ecosystem. Vendors naturally gravitate toward kernels with long-term support, including LSK, the Long-Term Support Initiative (LTSI), or the mainline LTS kernels, as the base for their backports, though, as it turns out, they don't use those kernels as their maintainers might wish.

As Tim Bird described, the kernels shipped in devices are often subject to more than one level of backporting. The system-on-chip (SoC) within the device will have been provided with a kernel containing plenty of backported code, but then the integrator who is building a product from that system will have another set of patches to add. The value of initiatives like LSK and LTSI is that they have reduced the number of kernel versions being provided by SoC vendors, making life easier for those doing backports at the integrator level. Projects like LTSI also list upstreaming of vendor code among their goals, and some of that has been done, but their most important role seems to be to serve as a common base for vendor-specific kernel forks.

There was a certain amount of unhappiness with how these long-term-supported kernels are used, though. An LTS kernel like 4.4 will be supported for at least two years; the LSK and LTSI kernels, based on LTS kernels, will have a similar support period. But SoC vendors are not actually making use of that support. Instead, they grab whatever version of the kernel is available at the time and simply stick with it going forward, ignoring any further updates to that kernel. Should a fix land in the kernel they started from that cannot be done without (a highly publicized security fix, for example), the vendors will, naturally, backport it. Product vendors then take a snapshot of the SoC vendor's kernel and ignore any updates from the SoC vendor in a similar manner. This pattern has led developers like Ted Ts'o to question the value of the entire stable-kernel process and suggest, once again, that vendors would be better off just using mainline kernels:

Why not just encourage them to get their device drivers into staging, and just go to a newer LTS kernel? Because I guarantee that's going to be less risky than taking a random collection of features, and backporting them into some ancient kernel.

Or, he said, SoC vendors could just start with a mainline release and pick their patches from subsequent releases rather than from the LTS kernel.

Time for a change?

Greg Kroah-Hartman, the maintainer of the long-term support kernels, agreed with this assessment of the situation, noting that even serious security fixes don't find their way through to the kernels shipped by vendors despite being promptly included in the LTS kernels. So he is mulling the idea of stopping the the maintenance of the LTS kernels entirely:

But if we didn't provide an LTS, would companies constantly update their kernels to newer releases to keep up with the security and bugfixes? That goes against everything those managers/PMs have ever been used to in the past, yet it's actually the best thing they could do.

Getting to the point where companies might actually see the wisdom of that approach will take some time, he acknowledged, and there will be a great deal of education required. But, he said, he has been talking to people at some vendors in the hope of improving the situation. He closed by saying there might not be a long-term support kernel next year, since it wouldn't be needed. Or, at least, "one has to dream".

In this context, it's interesting to look at this recent post from Mel Gorman, which talks about the problem of performance regressions in newer kernels. The performance hit caused by moving to a newer kernel can often be surprisingly large. It can also be difficult to fix, since it is usually the result of many patches adding a 0.1% cost rather than one or two big mistakes. The work required to get that performance back is significant, and it helps him to understand why vendors in general might be reluctant to move to newer kernels:

This is unrelated to the difficulties embedded vendors have when shipping a product but lets just say that I have a certain degree of sympathy when a major kernel revision is required. That said, my experience suggests that the effort required to stabilise a major release periodically is lower than carrying ever-increasing numbers of backports that get harder and harder to backport.

If embedded-systems vendors were to come to a similar conclusion, the result could be significant changes in how that part of the market works. The benefits could be huge. The upstream kernel would, hopefully, gain the best of the work that those vendors are carrying out of tree for now; the rest would be replaced with more general solutions that would better serve all users. Kernels shipped with devices would have more features and fewer bugs while being more secure than what is shipped now. It might actually become possible to run mainline kernels on these devices, opening up a range of possibilities from third-party support to simply enabling hobbyist developers to do interesting hacking on them. The considerable energy that goes into backporting now could be directed toward testing and improving the mainline kernel. And so on.

All of this seems like a distant dream at the moment, but our community has made a number of dreams come true over the years. It has also been quite successful in convincing companies that working with the community is the best approach for long-term success with Linux. Perhaps we are getting closer to a point where embedded-systems vendors will be willing to rethink their approach to the Linux kernel and find ways to work more closely with the development community that, in the end, they depend on to produce the software they ship. One does indeed have to dream.

Index entries for this article
Kernel	Development model/Stable tree
Kernel	Long-term support initiative