Welcome to LWN.net
The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!
Probably the biggest change for Python over the last five years or so is the advent of the "free-threaded" version of the language, which removes the global interpreter lock (GIL) and allows multiple threads to run in parallel in the interpreter. At PyCon US 2026, held in Long Beach, California in mid-May, longtime CPython core developer (and current steering council member) Thomas Wouters gave a talk about the feature. He looked at the motivation behind the GIL-removal efforts, some history, the current status of the free-threaded interpreter, and provided a prediction on where it all leads.
He began by noting that he has been doing CPython core development for
about 25 years at this point and has been on the steering council for five
of the last six years. The steering council is the body that determines the
path forward for language features, including free threading. Beyond that,
he works for Meta on the free-threaded interpreter and other things. While
it was not entirely relevant to the talk, he noted that he has three cats,
while putting up slides to show them. "In an alternate universe, there's
a version of this talk where I use my cats as my slides
", he said to
laughter and applause.
Motivation
But, this is not that talk, he said; "this is a boring talk
". While
he noted that most in the audience probably already knew, he quickly
introduced threads. They provide a way to execute multiple things at the
same time in a single process and its address space using separate
"threads of control
". The primary reason that threads exist is for
performance; memory access is slow, so threads are a way to give the CPU
something else to do while waiting for memory.
The same benefit can come from using multiple processes, but there is more
overhead and the address spaces are not shared. There are some additional
reasons, beyond performance, that a language like Python needs threads.
For example, continuing to execute the main program while calling into
blocking APIs or when interacting with a third-party library that requires
using threads to access it, "commonly databases
".
The GIL is "how CPython decided to support threads
" a long time ago,
predating his involvement with the language. The GIL protects Python
objects and their reference counts, which are used to determine if the
objects are still in use; it also protects CPython internals and is "the
most efficient way that CPython can support threads
". The GIL does not
protect a user's Python code because the user does not have control over
when the interpreter releases and reacquires the GIL. It "mostly does
not protect
" C and C++ extensions either, even though it is clearer
when an extension call might release the GIL, "but it's still very easy
to unintentionally rely on the GIL and end up not being safe
".
"Fundamentally, threads are hard
"; they are complicated and the GIL
does not make them any easier, Wouters said, even though it seems like it
does at times. But the GIL also makes threads less useful, which is why
there have been efforts to remove it from Python over the years. When the
GIL was first added, multi-CPU systems were rare; now his phone has eight
cores.
For a long time, the alternative has been to rewrite parts of the program in C, though for the last few years, that has been changed to Rust, which is better. That answer does not always work, however; it means that much of the application needs to be reworked to not only switch the code to the new language, but also to switch the data into that language's domain.
There are other options, such as using multiple processes, but that typically
requires more memory and copying data between the processes. Subinterpreters will allow a single process to
have multiple interpreters, each with its own GIL, that can be run in
separate threads, but there some problems with using that approach. The
interaction with third-party libraries, especially those that are not
written with Python in mind, "isn't there yet
" and there is no good
way, so far, to copy data between the subinterpreters. There is also asyncio,
which is effectively the same idea as green threads. He is
a big proponent of asyncio and thinks it is the best way to do
network I/O, "where threads can easily lead
you wrong
".
But sometimes those solutions do not provide the performance needed, he
said. "Without the GIL, multi-threaded solutions to those things can
offer higher throughput, lower memory use, and lower latency.
" But, as
mentioned, threads are difficult. That is because sharing data is hard to
do. CPUs and compilers optimize for speed and, because memory access is
slow, they do various things like caching, prefetching data, and
reordering memory accesses.
The interaction between those techniques and threads is "extremely
complicated
". If a thread writes a value, when does another thread see
it, and see all of the value, not just half of what was written? If a
thread writes two things, in what order does another thread see them?
Handling those problems requires things like memory fences, atomic
operations, and memory models, which he was not going to cover. His
takeaway: "Shared, mutable data in threads is bad.
"
The problem for Python is that everything is an object—and everything is
shared. Every object has a reference count, which is a small bit of
mutable data in every object, "so everything is shared mutable data
"
for Python. The CPython C API relies on the GIL in multiple ways, Wouters
said. For example, PyDict_GetItem()
returns a borrowed
reference to an item in a dictionary, which is not a problem if the GIL
ensures that the object does not change while the code, say, increases the
reference count so that it has its own (not borrowed) reference. Without
the GIL, the object can change at any time, even though it probably will
not frequently cause a problem, but code that is changing the reference
count is relying on the GIL for correctness.
Locks can be used to avoid these kinds of problems, but they are expensive
and would need to be used all over the place. Python has lists,
dictionaries, tuples, code objects, and so on that are "everywhere, the
whole process is full of these
"; replacing reference counting with a
lock would be "extremely expensive
". Even just making
PyDict_GetItem() use a lock would be expensive and the locks would
be present in all Python code, even if it is not using threads. "We
can't slow down the interpreter by 50% in order to make multiple threads
executing at the same time that much faster
", especially since there
are no users out there running Python workloads with multiple threads,
"because they can't
".
History
There have been efforts to remove the GIL going all the way back to Greg Stein's 1996 patch for Python 1.4; it replaced reference counting with a fine-grained lock and added other locks to various parts of CPython. It was not successful because the single-threaded-performance loss was not acceptable.
In 2013, Trent Nelson started PyParellel, which was a completely different model that did not use operating-system threads. It had a different API to create threads from within Python to do various tasks; it deferred reference counting and garbage collection until those threads were done. It wasn't a general solution because it did not work with anything that already used operating-system threads.
In 2015, Larry Hastings started the Gilectomy
project, which he worked on for a few years. He looked into novel approaches
of dealing with the reference count, specifically, but also into what else
would be entailed in removing the GIL from the language. He "pretty
much gave up
", though there were some other avenues he wanted to
explore, Wouters said.
A few years ago, "Sam Gross, who works at Meta, thought he could
solve most of those problems and he did so
" in a No-GIL fork of CPython 3.9. "That's what
we have now, more or less.
" Gross wrote PEP 703 ("Making the
Global Interpreter Lock Optional in CPython"), which described various
techniques needed in CPython to support free threading and remove the GIL.
For example, objects in Python now have an "owner" thread, which simply
means that the thread has fast access to the object and its reference
count; other threads can still access the object but they must take a slower
path, which uses atomic operations on a separate, shared reference count.
In addition, some reference-count operations are deferred until
garbage-collection time and then multiple operations are handled at once;
that constitutes a semantic change, however, because objects that would
have been reclaimed immediately when the count reached zero will be
deferred until the next garbage-collection run. "It's a small
difference, it's probably acceptable.
"
There are also speculative reference-count operations when
accessing list and dictionary members. An item object will have its
reference count incremented and then CPython will check to see if it is
still present in the list or dictionary; "that sounds really weird
"
and perhaps dangerous since the object may have been destroyed, he said.
There are safeguards and it is done under controlled circumstances; it is
not for general use, but is
"magic in the interpreter itself
".
"Quiescent-state-based reclamation
" is used to ensure that objects
do not disappear out from under other threads.
When lists and dictionaries are freed, their memory is not reused
immediately so that other
threads can still access the reference counts on the objects. The
memory is only cleared when the language knows that it is safe to do so,
which is generally when garbage collection is done, because all of the
threads have reached a quiescent state at that point.
Fine-grained locks were added as well; "some things just need a
lock
". A new
garbage collector was added because the existing one was complicated.
The memory allocator was switched to
mimalloc, which is a thread-safe allocator that also provides the
the hooks needed for the quiescent-state-based reclamation.
Putting all of that together results in lock-free lists, dictionaries, and
"other important types
",
he said, which makes access fast, especially for the owning thread; access from other threads is still
"pretty fast
". When performing actions like changing the type of an
object, there is a need to stop all threads to ensure that is done
correctly; "we stop the world, then we make the changes, then we run
them all again
".
There are also per-object locks that provide critical
sections. These are not regular locks, "they are a special type of
lock that's deadlock-free
". He noted that those who had worked with
locks would likely find that term to be an oxymoron. It works for CPython,
though, because the semantics of the GIL are recreated in the critical
section. The existing calls for acquiring and releasing the GIL are
repurposed to track which threads are not blocking in an operation so that
the "stop the world" operation can complete; threads that are going to
block should already be releasing the GIL before doing so.
The end result is that "in free-threaded Python, thread semantics are
almost the same as in GIL-ful Python
". It avoids the "scary
memory model, atomics, memory-order issues, you don't have memory fences,
none of that is important, Python just behaves like Python
".
Wouters noted that the term "free threaded" did not come from Gross, but from the steering council. It had been called the "No-GIL" Python, but the council did not want to end up having people talk about the "No No-GIL Python". Wouters said that "free threading" was not an industry term, it simply is a Python term that means that the GIL has been removed. It is also a bit of a misnomer, because free threading in Python is much less free than in C or C++, where you can write all over memory without restriction, or even Rust, which is better in that regard, but still has freer threading than Python.
Status
Python 3.13, which was released in October 2024, had experimental
free-threading support. It was "relatively slow, with 20-40% slowdown
on single-threaded workloads
". It had thread-safety issues and some scalability problems, where
adding more threads could make things go slower. "But it proved that it
worked.
"
For 3.14, from October 2025, free-threaded Python was "much safer
"
and "much faster
" with a 0-10% slowdown for single-threaded
workloads. "I'm still personally shocked that we got to 0% slowdown,
that's on Arm hardware, there's some magic going on there, I don't know
what it is, but it's amazing.
" On Linux using GCC, it's usually around
5%, though it can go up to 10%, particularly with older compilers.
He was not part of the steering council that approved PEP 779 ("Criteria for supported status for free-threaded Python"), which he co-authored, for 3.14. It described what needed to be done to remove the "experimental" tag from the free-threaded build and turn it into a supported feature for the language. The PEP acceptance announcement removed the experimental designation and described what the free-threaded developers needed to do moving forward.
Coming in October this year is Python 3.15, which will have a single, stable ABI for both GIL and free-threaded versions of the interpreter. That means extension developers can build their code once and it will load and run on Python 3.15 or later that are using either the GIL or free threading. There are also a lot of scalability improvements for free threading coming in 3.15.
Extension developers may be daunted by the prospect of adding
free-threading support, but developers at Quansight have put together
a guide to Python free
threading. It includes information on porting
extensions to the free-threaded interpreter. "It's not that
hard.
"
C global variables need to be protected because the GIL is no longer doing
so—if it ever actually was. "You can also just get rid of your C globals,
that's not a bad idea.
" Critical sections can be used to protect
mutable data in Python objects. If many threads will be accessing an
object at once, other kinds of locks may be desirable, though that can be
left as a later optimization.
In addition, extension modules need to declare that they support free
threading. There are a number of ways to do that; "check the
guide
". It is important not to remove any of the existing GIL-related
calls in the extension as those are used by the free-threaded interpreter.
Adding some multi-threaded tests is also a good idea, Wouters said; "if
you have threading bugs, they will show up, because you no longer have the
GIL accidentally protecting you some of the time
".
For regular Python code, there is probably not much—or anything—that needs
to be done. Unless there are already multi-threading bugs in the code,
however, since the
free-threaded version will make them more likely to occur. Once again,
adding some simple tests will likely "expose them that much
quicker
". The free-threaded interpreter makes it easier to find those
kinds of bugs.
Free threading will make using threads in Python more attractive, though, which may
also expose bugs in existing code. Making Python code scale well may
require more extensive changes. Having many threads accessing the same
objects is a problem that Python cannot solve directly. "Sharing
objects between threads is still problematic for performance.
" There
are some possible solutions, including the ft_utils
package, which has some scalable container types.
Future
There is still work to do on free-threaded Python, Wouters said. There are
performance and scalability improvements to be made in the interpreter and
in third-party packages that already support free threading. Beyond that,
working on supporting free threading in more third-party packages is
needed, though it is not necessary to consider free-threaded Python a
success. "It's already a success
", with support by more than 50% of the top
360 binary wheels on the Python Package
Index (PyPI). Much of the credit for that goes to Meta and Quansight,
which not only worked on that support, but also helped educate package
authors; there were, of course, other contributors to that effort, he said.
While he is on the steering council, "I'm not saying this with any
authority whatsoever
": he believes that the free-threaded version will
become the default Python in an upcoming release. He suspects that the
Python project will lag behind Linux distributors, such as Red Hat and
Debian, and large companies like Meta, that will have already enabled free
threading as the default. His prediction is that it will happen sometime
after 3.16 (due October 2027) and before 3.20 (October 2031).
At some point after that, the GIL-based version will go away entirely, he
thinks. "This is far into the future, this is next decade, that's what
I expect. But, you know, I might be surprised.
" That ended his talk,
but he still had a few minutes for questions.
One of those was about mixing modules that support free threading with
those that do not. Wouters said that modules without support for free
threading will currently cause the interpreter to emit a warning and
re-enable the GIL. That can be worked around with a flag to always disable
the GIL, "but that is playing with fire, so you probably only want to do
that for testing
".
Hastings stepped up to fill in "why Larry Hastings abandoned the
Gilectomy
", which was met with widespread laughter. He had been
trying to use some techniques that were arguably somewhat down the path
that Gross eventually took, but ran into huge scalability and performance
problems that Hastings found to be unsolvable. He said that Gross had
kindly informed him that the technique used for reference counts in the
free-threaded interpreter "had not been invented
" when Hastings was
working on Gilectomy—to more laughter. Wouters got the last word by noting
that he had considered putting that anecdote into the talk but did not want
to imply that Hastings was not smart enough to invent it himself.
[ I would like to thank the Linux Foundation, LWN's travel sponsor, for its assistance with my trip to Long Beach for PyCon US. ]
| Index entries for this article | |
|---|---|
| Conference | PyCon/2026 |
| Python | Free-threading |
The LWN site is currently under high scraper load, so comment display has been suppressed for anonymous users. If you are a human, you may read the comments by clicking the button below:
Note: you can avoid this step in the future by logging into your LWN account.