Python 3.14 shipped with a new incremental garbage collector. However, we’ve had a number of reports of significant memory pressure in production environments.
We’ve decided to revert it in both 3.14 and 3.15, and go back to the generational GC from 3.13.
3.15 is still in alpha, so such changes are fine. For 3.14, it is unusual for a patch release, but the old GC is a known quantity, the new incremental GC didn’t go through the PEP process, and was rolled back just before the final release of 3.13. We’ve discussed this in the core team and with the Steering Council.
If we want to reintroduce the incremental GC for 3.16, it can go through the regular PEP process and be more thoroughly evaluated.
Schedules:
-
3.15: The first beta is scheduled for 2026-05-05, just under three weeks from now. If the revert is ready to release within the next week or so, we can put out an extra alpha 9.
-
3.14: the next patch release 3.14.5 was planned for 2026-06-09, but we’ll release that early when the revert is ready.
I’ll update this topic and the release PEPs when those dates are known.
pitrou (Antoine Pitrou) 2
Would it be possible to include both GCs and let users choose one at startup, or would that be too costly maintenance-wise?
hugovk (Hugo van Kemenade) 3
It’d be too costly. Having two GCs in 3.14 but just one in 3.13 and 3.15 would make maintenance harder, and would also be much riskier. This is the sort of thing that would need evaluating by any future PEP.
tim.one (Tim Peters) 4
While it’s been a long time since I actively worked on CPython’s gc, that sounds right to me. It’s delicate code, and parts got much harder to follow when “clever tricks” were used to allow cutting the gc object pre-header struct from 3 members to 2. The simpler we can keep that code, the better for all.
Especially in a free-threading world, where corrupted memory due to races is likely to show up billions of cycles later, when gc finally gets around to touching every container object. That’s where memory corruption due to flawed extension modules often showed up even with a GIL.
storchaka (Serhiy Storchaka) 5
But we can have two GCs in both 3.14 and 3.15. The old GC should be default, but the new one can be available as an experimental option.
Unless this makes the code much more complicated.
pitrou (Antoine Pitrou) 6
That’s what I meant as well ![]()
@nas and I have both created branches with -X flags to toggle between GC versions. While it is doable, maintaining both versions would increase long-term maintenance overhead.
Despite this, I’m +1 to have two versions in 3.16+.
nas (Neil Schemenauer) 8
In my prototype that allows choosing on startup, the incremental GC adds about 1600 lines of code. It’s not too bad in terms of maintenance since code is quite well separated. Keeping gc_inc.c and gc_gen.c as separate compilation units didn’t cause any measurable performance regression. It is more code and more complication and so just going back to old one is the safe and conservative thing to do.
Given our experience with trying to introduce a new GC, we should ideally make a new one opt-in and keep the old as the default (e.g. for the 3.16 release). That’s what Java has done. OTOH, they have a team of research people working on new GCs, we have a couple people tinkering in their spare time. BTW, we do have two GC implementations already, the free-threaded one is basically separate and different.
I ran some extra timing benchmarks last night. Based on those, the incremental GC does have smaller maximum GC pause times. So if you care about that, you might prefer that one. The downsides, at least if there is a lot of cyclic garbage being created, is that process memory use can be dramatically higher (5x was the worst case I saw) and runtime is slower (more time spent in GC, longer total execution time). In one run, I had 1.3 ms max pause with incremental GC, 26 ms max pause with generational GC. Peak RSS size was 2.7x with incremental GC though.
BTW, Sergey has offered to create a PR to revert to the old GC. I’ll be reviewing and I would guess it won’t take too long since we both made prototypes already.
pitrou (Antoine Pitrou) 9
FTR, is that on a real-world workload or on a specifically-designed micro-benchmark that generates tons of cycles?
nas (Neil Schemenauer) 10
It’s not real world, it’s synthetic. And it just creates a lot of cycles. Getting better real-world like benchmarks or more reports from people testing it for real apps would be nice.
The pyperformance suite contains basically no interesting benchmarks in terms exercising the cyclic GC in a realistic way (benchmark programs don’t use much memory, or don’t run for very long, or don’t create any reference cycles). It has “gc_collect” and “gc_traverse” but these are micro-benchmarks and not at all realistic. I recently added a new one but this doesn’t create cycles. It is intended to test the overhead of GC while you create a large object graph. If you were tuning solely based on that, your conclusion would be to make it so the GC never runs. No cycles so it’s just all overhead.
The most interesting case recently was a Sphinx slowdown. That had the two key features: creating a significant number of container objects and having at least some of those contain reference cycles. That slowdown was resolved.
In the absence of realistic benchmarks and real-world reports, I think an extensive set of synthetic benchmarks would be helpful. We can at least confirm that cyclic GC performance doesn’t degrade too much under the range of situations that those benchmarks cover.
hugovk (Hugo van Kemenade) 11
This is the one that triggered the revert in 3.13: the revert happened a couple of days after that issue was opened.
tim.one (Tim Peters) 12
A seemingly eternal problem. Not optimistic. It’s why I said, in a different topic, that the best we can hope for is to stumble into new heuristics to worm around the real-world pathologies that pop up from time to time. Historically, almost all such cases I dealt with in sorting and pymalloc showed up first on Stack Overflow, with “random” users asking for help with “inexplicable slowdowns”.
When I was writing Python’s current sort, the only reports I got of timing results on various platforms came from people running the brief, synthetic, sortperf.py, which was part of the standard distribution at the time. Only one set of results from a real app running real data. They shared the result (“2x faster!”), but could not share their company’s data.
Quite recently I all but begged users to report just timing results (no data required) for a proposed change to the sorting algorithm. Your reply was the only one I got - and thank you for that
.
Similar story for judging the string of collision resolution strategies the dict implementation has tried. Overwhelmingly driven by synthetic inputs, and all but the current strategy (which hasn’t changed in years) were eventually discarded for catastrophic behavior on rare reports from real-life apps. But in that case, it’s provable that “catastrophic” collections of keys always exist - the question is the much subtler one of “but how likely is real-life data to stumble into one?”.
Not unique to Python. Sebastian Wild. an academic who co-created the terrific “powersort” merge-ordering heuristic, has reported the extremely poor response to his persistent pleas for “real world data”. Mostly he just attracts contrived bad cases.
Whereas I early on opened an issue about seemingly quadratic-time behavior on the main branch. I had no idea gc changes were to blame at the time, and the whittled test case created essentially no cycles. It just provoked the heuristics at the time to run parts of gc far more often than reasonable.
Much of sorting has quite predictable worst-case O() behavior, but the gc context is much messier than that.
It is! No doubt about it. It’s of scant use in predicting “average” behavior (and there’s no such thing as “a typical app” to begin with), but can be of real help in fleshing out the limits of what might be seen in real life.
It does make a case for making it possible to easily try inc gc in a production release. Else the chance of getting any real-world feedback, ever, shrinks from “slim, and mostly only for catastrophic cases” to “essentially none” ![]()
Have to play the hands we’re dealt
.
gpshead (Gregory P. Smith) 13
Process wise since we know this is a large change for a patch release, do we still have the ability to do release candidates for patch releases to enable some broader testing before declaring it stable? 3.14.5rc1 for example?
tim.one (Tim Peters) 14
My take: unconditional inc gc is too risky for a patch release. The cardinal rule for those is “first do no harm”. That the change is large is less important than that:
- it’s in a fundamental area of the implementation, which affects all programs (even those that never create a cycle)
- parts of the code are subtle and delicate
- it’s an area with a long history of producing highly app-dependent performance “surprises”
The pre-release history of this change is no exception so far, although I think all surprises so far have been discovered by core devs (like the “Sphinx report” was the result of heroic debugging efforts by @AlexWaygood),
And Neil’s synthetic tests establish without doubt that much worse surprises are still possible.
So, “too risky” for my tastes.
@nas seems to believe that a startup option to enable inc gc is doable with reasonable effort. If so, I like that:
- with luck, no visible changes by default
- supplies a way for motivated users to get real-world “both ways” results with minimal hassle, potentially greatly increasing the amount of real-world data we can hear back about. Will that actually happen? No, probably not
.
When I was writing Python’s current list.sort(), for development testing I gave a patch that added it as a new method of list. Comparing “before” and “after” just required those who cared to change one letter in their Python driving code.
That was effective, and made my testing life a lot easier too.
A startup option could make life similarly easier for comparative gc investigations.
tim.one (Tim Peters) 15
I’m put off by that the blurb only mentions an upside (reduced max gc pause times) but no downsides:
I look to NEWS for information, not a happy-talk sales pitch ![]()
I personally don’t care about pause times in most of my apps. Some of them can run for days to complete, and even 5% longer would matter to me: 3 or 4 extra hours of waiting These are typically doing research, and I need results to inform directions to try next. Although, ya, I typically disable gc for hours on end, in phases I know won’t be creating enough (if any) cycles to care about.
Not saying everyone “should be” like me in this respect, and I fully understand that reducing gc pause times is very important to some others’ apps. Am saying it’s important to be up-front about tradeoffs - and even better to discuss them broadly before they’re made. Which the PEP process would address - but somehow that seems like a very heavy process for an internal implementation change to carry.
Then again, most implementation changes don’t come with significant potential downsides.
So no easy answers here from me, just a hope that we can do better at “full disclosure” in the future.
hugovk (Hugo van Kemenade) 16
Hmm, we’ve not done that since 3.9.2rc1 in 2021. A quick demo build seemed to work, stopping short of uploading anything to servers, so that whole other half might have made assumptions about no RC after final. I also didn’t trying macOS or Windows builds. It would probably be a case of just doing it and fixing things up as needed.
hugovk (Hugo van Kemenade) 17
Nit pick: the Sphinx report had been discovered earlier by a contributor, but it was only during the Bellevue sprint we found the root cause. Having lots of us in the same room definitely helped! (Thanks, Meta!)
tim.one (Tim Peters) 18
And great work! Thanks for clarifying.
Another issue I haven’t seen addressed: people are already living with 3.14. Based on my conviction that “first do no harm” is the cardinal rule for patch releases, reverting inc gc can also harm them. In particular, those who’ve put in possibly substantial work to pick values for gc.set_threshold() that work well for their apps with inc gc (and doing so appears to be an effective mitigation for those whose apps were hurt by the swtich to inc gc) may see that effort backfire when going back to the 3-generation collector.
While there’s no way for me to know, my impression is that most people who went down that path didn’t muck with threshold 1, but reduced threshold0 to make gen 1 collections more frequent (and so collect longer-lived cycles sooner). Which is a wrong thing to do for the older 3-gen collector: it collects all the cycles there are every time a gen2 collection is done (inc gc only collects a fraction of them per gen1 try, so to “get the same effect” gen1 collections have to be done more often under inc gc).
From that view, perhaps it’s least harmful overall to keep inc gc the default (it’s already what’s out there), but add a startup option to switch to the older 3-gen collector (acknowledging that inc gc is still delivering unpleasant surprises for some apps).
No pure win to be had here, alas.
nas (Neil Schemenauer) 19
I made this. Better than a sharp stick in the eye, I suppose. I already found a pretty serious issue with the 3.14t GC, working on a fix for that.
nas (Neil Schemenauer) 20
Some results from that tool, 3.13 vs 3.14:
base=/usr/bin/python3 vs new=./py-3.14/bin/python
cycle extra live t(s) t% rss rss% trash trash% pause pause% peaked
----------------------------------------------------------------------------------------------------------
10 0 100 1.52 +23.8 17M +29 24k +398 1.35 +68 yes
10 0 10.0k 1.72 +18.6 24M +38 181k +124 1.07 -80 yes
10 0 30.0k 1.71 +12.0 27M +14 231k +8 2.31 -77 yes
10 10.0k 100 1.88 +23.7 41M +135 24k +398 1.52 +84 yes
10 10.0k 10.0k 2.51 +22.1 198M +108 181k +124 1.70 -75 yes
10 10.0k 30.0k 2.60 +9.1 250M +13 231k +8 2.56 -82 yes
10 100.0k 100 6.17 +67.5 251M +360 24k +398 1.80 +7 yes
10 100.0k 10.0k 7.81 -2.6 1.8G +135 190k +134 1.70 -89 yes
10 100.0k 30.0k 7.96 -12.0 2.2G +13 231k +8 3.11 -91 yes
10 300.0k 100 17.09 +15.3 717M +469 24k +398 2.47 -27 yes
10 300.0k 10.0k 19.02 -4.0 5.1G +125 181k +124 1.96 -93 yes
10 300.0k 30.0k 19.42 -7.5 6.5G +14 231k +8 2.86 -92 yes
100 0 100 1.13 +22.9 17M +30 28k +431 1.27 +47 yes
100 0 10.0k 1.27 +21.1 25M +41 191k +135 1.54 -63 yes
100 0 30.0k 1.29 +16.4 27M +24 231k +8 1.79 -80 yes
100 10.0k 100 1.18 +22.0 20M +44 28k +431 1.29 +45 yes
100 10.0k 10.0k 1.35 +20.2 43M +77 191k +135 1.40 -68 yes
100 10.0k 30.0k 1.44 +20.9 49M +15 231k +8 2.38 -77 yes
100 100.0k 100 2.19 +29.3 38M +138 28k +431 1.72 +81 yes
100 100.0k 10.0k 1.89 +24.3 206M +121 191k +135 1.57 -70 yes
100 100.0k 30.0k 1.97 +2.4 247M +14 231k +8 2.06 -85 yes
100 300.0k 100 3.49 +31.2 53M +142 28k +431 2.06 +96 yes
100 300.0k 10.0k 3.06 +10.3 567M +130 191k +135 1.16 -88 yes
100 300.0k 30.0k 3.19 +1.4 683M +15 231k +8 2.47 -80 yes
1.0k 0 100 1.10 +16.2 21M +48 111k +383 0.84 -41 yes
1.0k 0 10.0k 1.24 +20.2 26M +45 207k +123 1.47 -71 yes
1.0k 0 30.0k 1.27 +17.7 29M +31 268k +20 2.29 -74 yes
1.0k 10.0k 100 1.20 +28.4 22M +52 111k +383 1.27 +3 yes
1.0k 10.0k 10.0k 1.28 +23.6 28M +48 207k +123 1.44 -67 yes
1.0k 10.0k 30.0k 1.26 +11.6 31M +30 268k +20 1.53 -86 yes
1.0k 100.0k 100 1.18 +13.8 31M +102 111k +383 2.80 +75 yes
1.0k 100.0k 10.0k 1.28 +22.7 45M +71 207k +123 1.09 -68 yes
1.0k 100.0k 30.0k 1.30 +11.1 53M +26 268k +20 1.50 -82 yes
1.0k 300.0k 100 1.26 -0.3 50M +196 111k +383 1.35 -0 yes
1.0k 300.0k 10.0k 1.45 +26.4 84M +93 207k +123 1.51 -68 yes
1.0k 300.0k 30.0k 1.48 +12.1 102M +27 268k +20 2.17 -83 yes
Legend (base vs new, matched by cycle/extra/live):
t(s) total time for new build
t% percent change in time vs base, (new-base)/base*100
rss peak RSS for new build
rss% percent change in peak RSS vs base
trash max uncollected cyclic-garbage for new build
trash% percent change in max trash vs base
pause max GC pause (ms) for new build
pause% percent change in max GC pause vs base
peaked yes if new build RSS and trash peaked before final 25% of run
And then 3.13 vs 3.14t (there is bug with RSS based deferral of full collections)
base=/usr/bin/python3 vs new=./py-3.14t/bin/python
cycle extra live t(s) t% rss rss% trash trash% pause pause% peaked
----------------------------------------------------------------------------------------------------------
10 0 100 1.33 +8.3 34M +150 93k +1786 4.66 +479 yes
10 0 10.0k 1.34 -7.3 34M +89 100k +24 4.51 -15 yes
10 0 30.0k 1.36 -10.9 36M +51 130k -40 6.09 -40 yes
10 10.0k 100 2.22 +46.3 126M +628 91k +1746 6.81 +726 yes
10 10.0k 10.0k 2.25 +9.2 138M +44 100k +24 7.78 +14 yes
10 10.0k 30.0k 2.23 -6.8 170M -24 130k -40 7.86 -45 yes
10 100.0k 100 8.09 +119.7 1.2G +2084 91k +1746 7.36 +339 yes
10 100.0k 10.0k 8.13 +1.4 1.3G +70 100k +24 7.44 -50 yes
10 100.0k 30.0k 8.13 -10.2 1.6G -17 125k -42 8.75 -76 yes
10 300.0k 100 19.83 +33.8 2.8G +2194 91k +1746 8.10 +138 yes
10 300.0k 10.0k 19.76 -0.2 3.1G +38 100k +24 8.39 -71 yes
10 300.0k 30.0k 20.24 -3.6 3.9G -32 125k -42 10.68 -69 yes
100 0 100 1.01 +9.0 34M +150 93k +1683 4.04 +368 yes
100 0 10.0k 1.01 -3.5 34M +90 100k +23 4.71 +13 yes
100 0 30.0k 1.02 -8.1 36M +65 131k -39 5.40 -40 yes
100 10.0k 100 1.10 +13.7 42M +202 91k +1642 6.69 +656 yes
100 10.0k 10.0k 1.07 -4.7 42M +71 100k +23 5.46 +25 yes
100 10.0k 30.0k 1.08 -9.8 48M +12 130k -40 5.54 -46 yes
100 100.0k 100 1.71 +1.1 146M +814 91k +1642 5.21 +448 yes
100 100.0k 10.0k 1.72 +13.3 172M +84 100k +23 5.57 +8 yes
100 100.0k 30.0k 1.71 -11.3 189M -13 125k -42 6.51 -51 yes
100 300.0k 100 2.96 +11.3 332M +1399 91k +1642 5.35 +410 yes
100 300.0k 10.0k 2.95 +6.1 364M +47 100k +23 8.84 -7 yes
100 300.0k 30.0k 2.94 -6.5 460M -23 130k -39 7.42 -40 yes
1.0k 0 100 1.02 +7.4 34M +133 93k +304 4.15 +189 yes
1.0k 0 10.0k 1.03 -0.1 34M +89 100k +8 4.41 -12 yes
1.0k 0 30.0k 1.05 -2.9 36M +64 130k -42 5.09 -42 yes
1.0k 10.0k 100 1.03 +10.4 34M +130 93k +304 4.51 +267 yes
1.0k 10.0k 10.0k 1.03 -0.8 34M +80 100k +8 4.50 +3 yes
1.0k 10.0k 30.0k 1.07 -5.7 36M +50 129k -42 5.76 -48 yes
1.0k 100.0k 100 1.06 +2.4 44M +185 93k +304 4.52 +182 yes
1.0k 100.0k 10.0k 1.07 +2.4 44M +65 100k +8 4.47 +33 yes
1.0k 100.0k 30.0k 1.09 -7.1 50M +18 124k -44 5.19 -36 yes
1.0k 300.0k 100 1.22 -2.9 76M +351 93k +304 4.64 +241 yes
1.0k 300.0k 10.0k 1.25 +9.1 76M +73 100k +8 5.18 +11 yes
1.0k 300.0k 30.0k 1.25 -5.3 76M -6 125k -44 5.78 -55 yes
After my fix applied to 3.14t, fixes RSS deferral bug:
base=/usr/bin/python3 vs new=/home/nas/src/cpython/python
cycle extra live t(s) t% rss rss% trash trash% pause pause% peaked
----------------------------------------------------------------------------------------------------------
10 0 100 2.00 +62.7 27M +102 23k +364 1.83 +127 yes
10 0 10.0k 2.04 +41.2 27M +53 34k -58 1.77 -67 yes
10 0 30.0k 2.08 +36.5 29M +24 47k -78 2.59 -75 yes
10 10.0k 100 2.87 +88.8 31M +80 6k +20 1.22 +48 yes
10 10.0k 10.0k 3.06 +48.9 43M -55 18k -78 1.24 -82 yes
10 10.0k 30.0k 3.31 +38.8 73M -67 42k -81 2.67 -81 yes
10 100.0k 100 8.59 +133.4 105M +93 6k +20 1.27 -24 yes
10 100.0k 10.0k 9.03 +12.6 265M -66 18k -78 1.72 -89 yes
10 100.0k 30.0k 9.42 +4.1 583M -70 42k -81 3.57 -90 yes
10 300.0k 100 20.48 +38.2 211M +67 6k +20 2.04 -40 yes
10 300.0k 10.0k 20.90 +5.5 616M -73 18k -78 2.13 -93 yes
10 300.0k 30.0k 21.20 +1.0 1.3G -77 42k -81 3.82 -89 yes
100 0 100 1.40 +51.5 27M +102 23k +338 1.52 +76 yes
100 0 10.0k 1.45 +38.1 27M +54 34k -58 2.27 -46 yes
100 0 30.0k 1.45 +30.4 29M +35 47k -78 2.54 -72 yes
100 10.0k 100 1.86 +93.3 27M +97 12k +121 1.18 +33 yes
100 10.0k 10.0k 1.96 +74.4 27M +11 17k -79 1.08 -75 yes
100 10.0k 30.0k 1.94 +62.4 31M -27 42k -81 2.09 -79 yes
100 100.0k 100 2.22 +31.1 33M +108 6k +13 1.00 +5 yes
100 100.0k 10.0k 2.24 +47.6 47M -49 17k -79 1.05 -80 yes
100 100.0k 30.0k 2.39 +24.1 73M -66 42k -81 2.37 -82 yes
100 300.0k 100 2.85 +7.0 45M +104 6k +13 1.42 +35 yes
100 300.0k 10.0k 3.45 +24.4 83M -66 18k -78 1.16 -88 yes
100 300.0k 30.0k 3.63 +15.5 169M -72 42k -81 2.43 -80 yes
1.0k 0 100 1.45 +53.8 27M +88 25k +9 1.54 +7 yes
1.0k 0 10.0k 1.47 +43.0 27M +53 34k -64 1.69 -66 yes
1.0k 0 30.0k 1.48 +37.1 29M +34 47k -79 1.85 -79 yes
1.0k 10.0k 100 1.51 +61.8 27M +85 19k -17 1.23 -0 yes
1.0k 10.0k 10.0k 1.51 +45.3 27M +45 30k -68 1.83 -58 yes
1.0k 10.0k 30.0k 1.54 +36.6 27M +14 56k -75 2.03 -82 yes
1.0k 100.0k 100 1.97 +90.5 27M +77 11k -52 1.26 -21 yes
1.0k 100.0k 10.0k 1.88 +81.1 29M +10 17k -82 1.01 -70 yes
1.0k 100.0k 30.0k 1.93 +64.9 31M -26 41k -82 1.96 -76 yes
1.0k 300.0k 100 2.05 +62.3 29M +73 7k -70 1.13 -17 yes
1.0k 300.0k 10.0k 2.05 +78.1 33M -24 18k -81 1.43 -69 yes
1.0k 300.0k 30.0k 1.98 +50.3 41M -49 41k -82 2.01 -84 yes