A fork for the time-zone database?

19 min read Original article ↗
Ready to give LWN a try?

With a subscription to LWN, you can stay current with what is happening in the Linux and free-software community and take advantage of subscriber-only site features. We are pleased to offer you a free trial subscription, no credit card required, so that you can see for yourself. Please, join us!

A controversy about the handling of the Time Zone Database (tzdb) has been brewing since May, but has come to a head in recent weeks. Changes that were proposed to simplify the main database file have some consequences in terms of time-zone history and changes to the representation of some zones. Those changes have upset a number of users of the database—to the point where some have called for a fork. A September 25 release of tzdb with some, but not all, of the changes seems unlikely to resolve the conflict.

The time-zone database is meant to track time-zone information worldwide for time periods starting at the Unix epoch of January 1, 1970. But, over the years, it has accumulated a lot of data on time zones and policies (e.g. daylight savings time) going back many years before the epoch. As with anything that governments and politicians get involved with, which time zone a country (or part of a larger country) is in, whether it participates in daylight savings time (DST), and when the DST switches are made, are arbitrary and subject to change, seemingly at whim. Tzdb has been keeping up with these changes so that computer programs can handle time correctly since 1986 or so, when it was often called the "Olson database" after its founder, Arthur David Olson.

Merging time zones

Back in May, tzdb maintainer Paul Eggert proposed two changes, one of which was winnowed out during the discussion. The other was to merge zones (identified by location strings, such as "Europe/Berlin") that have the same post-1970 history under a single name. The entries that got merged out would be maintained as links to the merged zone and their pre-1970 history would be moved into the backzone file that is also distributed with the database. Which zone name would be the "winner" is based on the rules embodied in the tzdb theory document; the most populous choice would be the main zone, while the others would be relegated to the status of links.

The maintainer of the Joda-Time library for date and time handling in Java, Stephen Colebourne, disagreed strongly with the change and asked that it be reverted. One of the main problems is that programs using tzdb and looking at pre-1970 dates for places that lost out in the merge will get incorrect time-zone information. The pre-1970 time-zone information for those regions will, effectively, be lost, he said:

[...] many places (eg Anguilla, Antigua and Aruba) are now sharing time-zone history, and that history is **from some other zone**. That seems completely unacceptable.

While I understand the motivation to remove the burden of pre-1970, that cannot come at the cost of giving a place the history of somewhere completely different.

In another message, he pointed out other problems that he saw with the changes, which are partly political in nature:

For example, Norway's and Sweden's time zone history is being wiped out in favour of that of Germany. Can no-one here see the political sensitivity in that?

This has a very serious impact on Joda-Time because it normalizes time-zone IDs. (It treats a Link as the key to the normalization, so anything at the weak end of a Link is replaced by the ID at the strong end. You might complain that it shouldn't do that, but it has operated that way for 20 years...)

This code:

  DateTimeZone zone = DateTimeZone.forID("Europe/Stockholm");
  System.out.println(zone);
will print "Europe/Berlin" if this change is not reverted. I consider this to be catastrophic.

Since Berlin and Stockholm (Oslo, also, as mentioned elsewhere) share the same post-1970 time-zone history, they would all get merged under "Europe/Berlin" as Berlin is the most populous. The pre-1970 history for those countries (Sweden and Norway) would get moved to the backzone file; the data is still available, but many applications have gotten used to getting accurate historical information without consulting backzone. For his part, Eggert is trying to solve an unfairness problem within the database:

Why should we maintain Norway and Sweden's time zone histories, when we don't maintain the histories for Guangdong, KwaZulu-Natal, Thanh Hóa, or Uttar Pradesh? Aside from politics, these regions are similar: although all the regions have distinct timestamp histories with data that I can cite, all the regions can be merged into other tzdb regions (Norway into Berlin, Guangdong into Shanghai, etc.) if we consistently limit tzdb's scope to regions that differ after 1970. Given all that, why should Norway and Sweden continue to be special?

These are not particularly-obscure examples, as Guangdong etc. all have more people than Norway or Sweden do. It would be political to continue to focus on Norway and Sweden while excluding Guangdong etc. purely for reasons unrelated to timekeeping.

He also pointed out that Joda-Time already has to deal with these kinds of merges; earlier releases, including the 2021a release from January, have merged zones. "Whatever techniques people use for these longstanding links should also work for the new links." Eggert is convinced that he is solving a real problem here:

The current patch was not prompted by purism. It was prompted by a complaint from a user who made a good point about the politics of tzdb 2021a, which can reasonably be interpreted to favor countries like Norway etc. over countries like Kosovo etc. Rejecting this kind of complaint and saying "we've always done it that way" is not a promising path forward.

While he believes that users will not really be affected by the changes and that the merge process (which has been ongoing for a number of years in a slower, less-visible manner) is working well, some disagreed. Derick Rethans, who maintains the date and time handling for PHP, Hack, and MongoDB, said that the cleanups are making things worse, in part because they ignore backward compatibility. Colebourne was even more blunt:

Let me be clear - this change cannot stand. The reliability of TZDB has declined considerably over the past few years, but it is time to say enough is enough. This is where the line in the sand needs to be drawn.

Part of the problem is that currently backzone has a fair amount of poor quality data that got shifted out of the main database long ago for that reason. Moving well-researched historical data into that file (for, say, Sweden) makes it difficult to distinguish the two. Eggert said that, currently, the database can be built either with or without the backzone data, but it's an all-or-nothing choice; that could perhaps change moving forward:

For example, if a downstream user wants the 'backzone' entry for Europe/Stockholm which is well-documented, but doesn't want backzone's America/Montreal entry because it's not well-attested and is most likely wrong, the user could specify a list of backzone names that includes Europe/Stockholm but excludes America/Montreal. I think it would not be too much work to add something like this to the tzdb code.

A problem with that approach is that applications may just generally consult whatever tzdb the operating system has installed. Today that means they will get proper time-zone information for, say, Norway on pre-1970 dates, but down the road they would not, unless the operating system builds a version including some of the data from backzone. Different choices of exactly which data to include could easily create incompatibilities between systems for pre-1970 dates.

Charter breach?

On June 3, Colebourne formally requested a reversion of the time-zone merging because he said it breached the RFC 6557 charter. For one thing, the TZ Coordinator (i.e. Eggert) "has not taken into account the views of the mailing list" as required by the charter. Furthermore, the cleanups are not within the scope of the charter, he said.

Multiple people spoke up in support of Colebourne's message, though not all of them agreed that Eggert's plans were a breach of the charter. The clear consensus in that thread, though, was that the changes should be reverted so that some other solution could be found. Maintainers of the date and time code for multiple projects were opposed to the changes, though perhaps not as an official position of the project, at least yet. While the backzone was mentioned, it is not truly a workable solution for projects like PostgreSQL, as pointed out by Tom Lane:

However, the Postgres project is finding itself in a hard place precisely because we *didn't* adopt backzone. We reasoned that the default set of zones was the preferred thing and thus would be the most likely to remain stable. Now, not only is the default different (which perhaps we could live with), but there's no way at all to get the old default. That's not okay, and it seems to me to fly in the face of most understandings of software backwards compatibility, never mind any tzdb-specific rules.

Eggert did not directly address the breach claim, but a few days later posted a compromise idea that would provide a build flag to create the database in two different ways: as it was before the merges and as it is with them. But the question then becomes: which is the default? Many applications do not build the database, but use it as distributed in the tarfile, Colebourne said. Eggert was not opposed to providing an alternate tarfile, but did not seem inclined to revert the merges that he proposed.

On the other hand, Colebourne and seemingly everyone else participating in the thread are willing to work on some kind of technical solution that solves the problems, but think that the current merges should be reverted first. There are ways to derive the two different views into the data, Colebourne said, but that requires keeping the existing data in the main database file. That file can be processed to automatically do the merges as Eggert wants, but that the reverse is not true, Colebourne said. In another message, he described the situation as an impasse, saying that there had been many requests for reversion and "no requests to retain it".

There are technical solutions available to reduce the amount of data published to downstream users, but the starting point must be a fully populated database, not one that is logically broken. The next action must be to revert. Then we can agree on any technical measures necessary.

Colebourne started a thread on what data tzdb should contain. It described the kinds of data present in the database, how they are used, and the problems that need to be addressed with them. It offered up a proposal based on his plan to automatically process the file to create the merges for regions that share post-1970 history, but to retain the existing data so that pre-1970 history did not move to backzone. The proposal was received positively, though there was some constructive criticism; Eggert did not really participate in that thread, however.

Samoa

For a few months, that is where things stood. The development version of tzdb had the merges Eggert proposed, along with various other fixes made along the way. On September 13, Eggert said that a new release was not imminent in response to a query about the status of the merge changes. But, then, along came Samoa.

On September 20, Geoffrey D. Bennett posted a notice that on September 15 Samoa had decided to stop switching to daylight savings time. That meant that tzdb needed to change to reflect that—and before the September 26 DST-switch date. As Eggert put it: "That's not much notice".

Later that day, Colebourne posted "Preparing to fork tzdb". The imminent release that seemed likely to contain those changes meant that a fork was needed in order to maintain the zones as they are in the 2021a release, he said. He would prefer that Eggert revert the changes, but:

In the event that the tzdb maintainer does not revert, consideration must be given to forking the project. The purpose of the fork would initially be to maintain the tzdb data set as it was prior to the dispute. This would then be released in parallel to the original tzdb to ensure that downstream projects do not each do their own thing (ie. to minimize incompatibilities downstream).

Colebourne asked if there was support for such a fork and whether there were people or organizations willing to assist. For the most part, the reaction to the idea of a fork was unfavorable; there were exceptions (Lane and Rethans, for example). Eliot Lear noted several downsides, including confusion among users and implementers, as well as fragmentation of expertise between the two. He suggested proposing changes to RFC 6557 as a way forward. In a somewhat similar vein, Emily Crandall Fleischman suggested invoking the procedure to replace the coordinator as a better alternative than a fork.

Eggert said that the fork would be discriminatory and that it would take a lot of work to fix the fork:

Such a fork would arbitrarily discriminate against countries like Angola and Niger, and in favor of countries like Norway and Sweden.

A primary goal of the recent patches was to avoid racial or national preferences that were present in the previous setup. Arguably these preferences were not intentional, or were apparent and not real; however, that's not an argument I would want to defend.

He suggested working together on technical solutions to resolve the problems that stem from the changes. He also objected to the idea that the data was getting "wiped out" by the changes. But, as Lane pointed out, including the backzone data "does *not* reproduce what was formerly the default set of zones". He said that it might be technically correct to argue that the data is not going away, but that does not really reflect the reality of the situation:

I'm all for improving equity in tzdb's coverage, but I think it should be done by adding coverage for underserved areas, not removing data from areas that had been well-covered. And let's make no mistake: removing data from the default build is removing data, for many downstream users who won't have an opportunity to make their own decisions about what their platforms provide.

Colebourne renewed his call for a reversion. Like Lane, he believes that adding more data is the solution to the problem, but another possibility would be to remove all pre-1970 data from tzdb by moving it to backzone. In the meantime, the path forward is clear in his mind:

The *only* good faith move you can make right now is to revert the patch. I'm quite happy to discuss practical solutions once that is done. If 2021b is released with the disputed patch then the fork will occur, and you as TZ coordinator will have directly caused the fork.

As suggested, Colebourne also polled the list to see if there was a consensus that a change to the coordinator is needed. The results of that were a resounding "no", which he acknowledged in the thread. His June appeal to the Internet Engineering Steering Group (IESG) about a breach in the charter was answered on September 22. Murray S. Kucherawy said that he disagreed with Colebourne's arguments, though that is not necessarily the final answer if he wishes to pursue it further. Colebourne said that both of the formal options for relief had been tried and failed:

The potential options remaining are to fork the project or to solve the issue. For the avoidance of doubt, my preferred option would be to solve the issue.

He noted that there was strong support for the idea of releasing 2021b with just the minimal changes needed to support the Samoa change, then taking a week or so to calm everything down and start to work on other solutions. He asked Eggert if he would do so, but it is perhaps not surprising that Eggert declined. He is concerned that the discrimination problems are now more visible because of the dispute, so he needs to act now:

Unfortunately, the equity issue has broadened and is now visible outside our little community, and I really and sincerely doubt whether it'd be a good idea for us to do nothing about it now. We need to establish that we are fixing the problem and are not deferring action to a never-never land of arcane bureaucracy, and we need to do so in terms that will be clear to outsiders.

He did compromise to some extent by proposing to only merge nine zones for 2021b, rather than the 30+ he proposed to begin with. That would provide evidence that progress is being made, while avoiding the biggest problem area:

[...] the idea is to revert most (but not all) of the objected-to changes. In particular, this will revert the changes to Europe/Oslo and Europe/Stockholm, which have drawn the most objections. The idea is to take the first step now, and to take more steps in future releases (which should not be distant-future releases, as we need to continue to make and exhibit a good-faith effort to fix the problem).

That was not acceptable to Colebourne, again unsurprisingly. But meanwhile, the clock was ticking. Other proposals were made; Russ Allbery wanted to reframe the debate by changing the "naming layer", while Lane tried to find a way to maintain the existing set of zones (and all of their historical data) going forward. Colebourne summarized the whole issue regarding pre-1970 data, while attempting to be even-handed; that led to yet another enormous thread, though he asked that only actual corrections be posted. He also put out a lengthy blog post about the dispute.

2021b release

On September 25, Eggert released version 2021b of tzdb with the merges of nine zones as his amended proposal indicated. He followed up the release announcement with a justification for the release and the choices made in it. Eggert simply sees the 30+ changes he proposed in May as the endgame for a process that started in 2013, though he did acknowledge problems with making changes to so many zones at once, thus the reduction to nine for 2021b.

Historical data is mostly only used by astrology programs, he said, and it is "typically grossly inadequate for realistic use outside the named location". Tzdb focuses on accurate data for 1970 and beyond; his efforts to merge zones is part of that. Now that the fairness issue has come to the fore, it is time to deal with it:

Norway and Sweden have triggered concerns, much more so than similar changes made (for example) to Angola and Congo in tzdb 2014g.

[...] It's a bad look for us that so much concern about Norway and Sweden has appeared on this mailing list, even though hardly anybody seems to have cared about Angola and Congo. It'll be an even worse look if we ignore this issue weeks, months or even years after it's been made clear to us.

[...] With all this in mind, issuing 2021b now is a significant step toward equity in tzdb. It will let us say that we are moving toward a fair process, and will give us the opportunity and motivation to improve on that process and to address and balance the various other concerns that have recently appeared on the mailing list.

As might be guessed, Colebourne was very unhappy with the release.

In summary, I am livid with the high-handed approach you have taken wrt the release of 2021b. Despite near unanimity of the mailing list requesting you to release 2021a+MinimalChanges, you progressed 9 out of the 30 link merges based on a rationale that you acknowledge is not universally accepted.

He said that he would be taking a few days away from the issue, but planned to "start a positive discussion as to what the next steps can be" after that. In his blog post he noted that he would be looking into alternatives, such as perhaps moving a fork of tzdb under the Unicode Common Locale Data Repository (CLDR) project. It turns out that Eggert is not opposed to CLDR being involved in some fashion. Perhaps some kind of compromise can be found in that direction.

The dispute spanned multiple, gigantic threads in May, June, and September. The call to fork tzdb also spawned several heads-up emails to LWN; thanks for those. It is the most visible thing to happen in the normally quiet tzdb arena since the 2011 lawsuit against Olson and subsequently moving tzdb under the Internet Corporation for Assigned Names and Numbers (ICANN) .

It is a little hard to see how users of tzdb are served by making pre-1970 zone information worse for some places, even if those places had been "elevated" in status incorrectly along the way. Dumping that information into the backzone file is tantamount to losing it completely, though the historical record of those moves could be used to reconstruct things. Perhaps a separate "historical tzdb" project is needed that better serves the needs of astrologers and others who have needs for that kind of information. It would be plausible to use the existing tzdb contents as at least a starting point—perhaps more than that.

The timing crunch caused by Samoa's late decision on DST changes was not only disruptive for tzdb, but also for residents of Samoa, as Eggert noted. Two weeks is not a lot of time to get the word out, even outside of the computer realm, but many computers and devices did not magically update to 2021b, so they switched to DST as (previously) scheduled.

It is also unfortunate that the coordinator took the opportunity to lock in these controversial changes on (relatively) short notice over the vehement opposition of some. However inequitable the zone choices in 2021a (and before) were, things had been that way for a long time; disrupting users and developers to create a kind of fait accompli is not a particularly good look either. There are already questions on the mailing list about what distributions and other tzdb users should do with the changes. Taking a bit more time to come up with a scheme that addressed all of the concerns, then making all of the changes at once using that mechanism in a month or two hardly seems burdensome—or unfair—but here we are.