A 'frozen' dictionary for Python

7 min read Original article ↗

Dictionaries are ubiquitous in Python code; they are the data structure of choice for a wide variety of tasks. But dictionaries are mutable, which makes them problematic for sharing data in concurrent code. Python has added various concurrency features to the language over the last decade or so—async, free threading without the global interpreter lock (GIL), and independent subinterpreters—but users must work out their own solution for an immutable dictionary that can be safely shared by concurrent code. There are existing modules that could be used, but a recent proposal, PEP 814 ("Add frozendict built-in type"), looks to bring the feature to the language itself.

Victor Stinner announced the PEP that he and Donghee Na have authored in a post to the PEPs category of the Python discussion forum on November 13. The idea has come up before, including in PEP 416, which has essentially the same title as 814 and was authored by Stinner back in 2012. It was rejected by Guido van Rossum at the time, in part due to its target: a Python sandbox that never really panned out.

frozendict

The idea is fairly straightforward: add frozendict as a new immutable type to the language's builtins module. As Stinner put it:

We expect frozendict to be safe by design, as it prevents any unintended modifications. This addition benefits not only CPython's standard library, but also third-party maintainers who can take advantage of a reliable, immutable dictionary type.

While frozendict has a lot in common with the dict built-in type, it is not a subclass of dict; instead, it is a subclass of the base object type. The frozendict() constructor can be used to create one in various ways:

    fd = frozendict()           # empty
    fd = frozendict(a=1, b=2)   # frozen { 'a' : 1, 'b' : 2 }
    d = { 'a' : 1, 'b' : 2 }
    fd = frozendict(d)          # same
    l = [ ( 'a', 1 ), ( 'b', 2 ) ]
    fd = frozendict(l)          # same
    fd2 = frozendict(fd)        # same
    assert d == fd == fd2       # True

As with dictionaries, the keys for a frozendict must be immutable, thus hashable, but the values may or may not be. For example, a list is a legitimate type for a value in either type of dictionary, but it is mutable, making the dictionary as a whole (frozen or not) mutable. However, if all of the values stored in a frozendict are immutable, it is also immutable, so it can be hashed and used in places where that is required (e.g. dictionary keys, set elements, or entries in a functools.lru_cache).

LWN.net is able to bring you articles like this one because of our generous subscribers. If you want to see more like it, consider taking advantage of our special offer: 1 month trial subscription

As might be guessed, based on the last line of the example above, frozen dictionaries that are hashable can be compared for equality with other dictionaries of either type. In addition, neither the hash() value nor the equality test depend on the insertion order of the dictionary, though that order is preserved in a frozen dictionary (as it is in the regular variety). So:

    d = { 'a' : 1, 'b' : 2 }
    fd = frozendict(d)
    d2 = { 'b' : 2, 'a' : 1 }
    fd2 = frozendict(d2)
    assert d == d2 == fd == fd2

    # frozendict unions work too, from the PEP
    >>> frozendict(x=1) | frozendict(y=1)
    frozendict({'x': 1, 'y': 1})
    >>> frozendict(x=1) | dict(y=1)
    frozendict({'x': 1, 'y': 1})

For the unions, a new frozen dictionary is created in both cases; the "

|=

" union-assignment operator also works by generating a new

frozendict

for the result.

Iteration over a frozendict works as expected; the type implements the collections.abc.Mapping abstract base class, so .items() returns an iterable of key-value tuples, while .keys() and .values() provide the keys and values of the frozen dictionary. For the most part, a frozendict acts like a dict that cannot change; the specific differences between the two are listed in the PEP. It also contains a lengthy list of places in the standard library where a dict could be switched to a frozendict to "enhance safety and prevent unintended modifications".

Discussion

The reaction to the PEP was generally positive, with the usual suggestions for tweaks and more substantive additions to the proposal. Stinner kept the discussion focused on the proposal at hand for the most part. One part of the proposal was troubling to some: converting a dict to a frozendict was described as an O(n) shallow copy. Daniel F Moisset thought that it would make sense to have an in-place transformation that could be O(1) instead. He proposed adding a .freeze() method that would essentially just change the type of a dict object to frozendict.

However, changing the type of an existing object is fraught with peril, as Brett Cannon described:

But now you have made that dictionary frozen for everyone who holds a reference to it, which means side-effects at a distance in a way that could be unexpected (e.g. context switch in a thread and now suddenly you're going to get an exception trying to mutate what was a dict a microsecond ago but is now frozen). That seems like asking for really nasty debugging issues just to optimize some creation time.

The PEP is not aimed at performance, he continued, but is meant to help "lessen bugs in concurrent code". Moisset noted, that dictionaries can already change in unexpected ways via .clear() or .update(), thus the debugging issues already exist. He recognized that the authors may not want to tackle that as part of the PEP, but wanted to try to ensure that an O(1) transformation was not precluded in the future.

Cannon's strong objection is to changing the type of the object directly. Ben Hsing and "Nice Zombies" proposed ways to construct a new frozendict without requiring the shallow copy—thus O(1)—by either moving the hash table to a newly created frozendict, while clearing the dictionary, or by using a copy-on-write scheme for the table. As Steve Dower noted, that optimization can be added later as long as the PEP does not specify that the operation must be O(n), which would be a silly thing to do, but that it sometimes happens "because it makes people stop complaining", he said in a footnote. In light of the discussion, the PEP specifically defers that optimization to a later time, suggesting that it could also be done for other frozen types (tuple and frozenset), perhaps by resurrecting PEP 351 ("The freeze protocol").

On December 1, Stinner announced that the PEP had been submitted to the steering council for pronouncement. Given that Na is on the council, though will presumably recuse himself from deciding on this PEP, he probably has a pretty good sense for how it might be received by the group. So it seems likely that the PEP has a good chance of being approved. The availability of the free-threaded version of the language (i.e. without the GIL) means that more multithreaded Python programs are being created, so having a safe way to share dictionaries between threads will be a boon.


Index entries for this article
PythonDictionaries
PythonPython Enhancement Proposals (PEP)/PEP 814