Evolving Git for the next decade

15 min read Original article ↗
Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

Git is ubiquitous; in the last two decades, the version-control system has truly achieved world domination. Almost every developer uses it and the vast majority of open-source projects are hosted in Git repositories. That does not mean, however, that it is perfect. Patrick Steinhardt used his main-track session at FOSDEM 2026 to discuss some of its shortcomings and how they are being addressed to prepare Git for the next decade.

Steinhardt said that he began to be interested in open-source software around 2002, when he was 11 years old. He bought his first book on programming when he was 12, and made his first contribution to an open-source project in 2011. He became a Git and libgit2 contributor in 2015, has been a backend engineer at GitLab since 2020, and became the manager of the Git team there in 2024.

Git must evolve

Git turned 20 last year; there are millions of Git repositories and even more scripts depending on Git. "The success of Git is indeed quite staggering." However, the world has changed quite a bit since Git was first released in 2005; it was designed for a different era. When Git was released, SHA-1 was considered to be a secure hash function; that has changed, he said, with the SHAttered attack that was announced in 2017 by Centrum Wiskunde & Informatica (CWI) and Google. In 2005, the Linux kernel repository was considered big; now it is dwarfed by Chromium and other massive monorepos. Continuous-integration (CI) pipelines were the exception, he said, in 2005—but now projects have pipelines with lots of jobs that are kicked off every time there's a new commit.

Also, Steinhardt said to general laughter: "Git was very hard to use back then; but to be quite honest, Git's still hard to use nowadays." So, the world has changed and Git needs to change with it. But, he said, the unique position of Git means that it can't have a revolution; too many projects and developers rely on it. Instead, it needs to evolve, and he wanted to highlight some of the important transitions that Git is going through.

SHA-256

The most user-visible change that Git is going through today, he said, is the SHA-256 transition. SHA-1 is a central part of the project's design; every single object stored in Git, such as files (blobs), directory trees, and commits, has an identity that is computed by hashing the contents of the object. Objects are content addressable, "given the contents, you know the name of the object". That name, of course, is computed using the no-longer-secure SHA-1.

The work by CWI and Google proved that attacks on SHA-1 are viable. It requires a lot of compute, about 110 years worth of single-GPU computations, but it is possible. He noted that with all the hype around artificial intelligence, data centers have greatly increased their GPU capacity. "It is very much in reach of a large player to compute hash collisions".

SHAttered kicked off quite a few conversations on the Git mailing lists. During these conversations, he said, it has been asserted that the use of SHA-1 is not primarily for security and a number of arguments have been made to back that up. The SHA-1 object hash is primarily used as an integrity check to detect transmission errors or bit flips. Also, source code is transparent, "if you see a merge request where somebody enters random collision data into your code, then you might probably ask some questions". Additionally, there are other security measures such as GPG signatures, HTTPS transport, and a web of trust among developers that means Git does not rely on SHA-1 alone.

"But the reality is that things are a little bit more complicated", Steinhardt said. Git may not rely on SHA-1 for security, but everyone else does. When developers sign a commit with Git, for example, it is the SHA-1 hash that is signed. It might be noticeable if source code is changed to cause a collision, but binary blobs such as firmware are not human-readable, so there is no way to easily see that there is a malicious file. Tooling around Git also assumes collision resistance, so CI systems, scripts, and such all trust the SHA-1 hash.

Finally, various governments and enterprise requirements have mandated removal of SHA-1 by 2030, so Git needs to move on. And it has: SHA-256 support was added in October 2020, with version 2.29. "But nobody is using it", Steinhardt said, because ecosystem support is lacking. "Unfortunately, this situation looks somewhat grim". There is full support in Git itself, Dulwich Python implementation, and Forgejo collaboration platform. There is experimental support for SHA-256 in GitLab, go-git, and libgit2. Other popular Git tools and forges, including GitHub, have no support for SHA-256 at all. That creates a chicken-and-egg problem, he said. Nobody is moving to SHA-256 because it is not supported by large forges, and large forges are not implementing support because there's no demand.

The problem, Steinhardt said, is that we cannot wait forever. It will become more and more feasible to break SHA-1, and the next cryptographic weakness may be just around the corner. Even if there were full support for SHA-256 today, projects still need time to migrate. Git will make SHA-256 the default for newly created repositories in 3.0, he said. The hope is to force forges and third-party implementations to adapt. "The transition will likely not be an easy one, and it may result in a few hiccups along the road." When 3.0 will be released is still up in the air; a discussion about its release date in October 2025 on the Git mailing list did not result in a firm decision.

He said that the audience could help to move things along. "You can show your favorite code forges that you care about SHA-256 so they bump the priority." He also encouraged people to help by testing SHA-256 with new projects and adding support to third-party tools that depend on Git. "Together, we can hopefully get the ecosystem to move before the next vulnerability".

Reftables

Another significant shift for Git, which he declared his favorite topic for discussion, is the move to reftables. By default, Git stores references as "loose" references, where each is stored as a separate file such as "refs/heads/main". The format for these files is straightforward to understand, he noted, but storing every single reference as a file does not scale well. It is fine for a project with a handful of references, but if there are hundreds or thousands then it becomes really inefficient.

To deal with that inefficiency today, Git will create a packed-refs file; this can be done manually with "git pack-refs --all", but Git will also do it automatically. However, Steinhardt said, Git still needs to change the way it deals with references.

The first reason he gave is that "filesystems are simply weird". Many filesystems, for example, are case-insensitive by default. That means that Git cannot have two branches whose names only differ in case, as just one example. It is also an inefficient design, he said: to create 20 different references, Git has to create 20 different files. That may not take long from a performance perspective, but each reference requires 4KB of storage for typical filesystems. That begins to add up quickly.

Packed references are computationally expensive, he said, which is not a problem if a project only has a few references. "But, Git users are not always reasonable." He said that GitLab hosts one repository with about 20-million references; each time a reference is deleted, the packed-refs file has to be completely rewritten which means rewriting 2GB of data. "To add insult to injury, this repository typically deletes references every couple seconds."

The third problem Steinhardt described is that concurrency is an afterthought. It is impossible to get a consistent view of all references when there are multiple readers and writers in a repository at the same time. When a user writes to a repository while another user is reading the references, it is impossible to know if they are getting a consistent result or a mixture of the old and new state.

Those problems have been known for a long time, he said, and that is where the reftable backend comes into the picture. Users can create a new repository with a reftable today. The tables are now stored in a binary format rather than the text-based, which is more efficient—though it does mean that the files are no longer human-readable. The new data structure also allows Git to perform atomic updates when writing references to the reference table, and Git is no longer subject to filesystem limitations when it comes to naming references.

As with SHA-256, reftables will become the default in Git 3.0. "So if you use Git in scripts or on the server side, you should make sure you don't play weird games by accessing references directly on the filesystem". Instead, Git users should always access references with the git command.

Large files

Steinhardt said that, for most of the people in the room, the scalability problems related to references were mostly theoretical and rarely encountered in practice. When it comes to scalability bottlenecks, "the more important problem tends to be large files". Storing large binary files in Git is, unfortunately, not a use case that is well-supported today. There are third-party workarounds, such as Git LFS and git-annex, but the Git project would like to solve the problem directly.

Large files are a problem for Git because of the way that it compresses objects, he said. It works extremely well when working with text files, such as source code, because that is what Git was designed for. But Git's compression does not work well for binary files, and even small edits to such files means creating entirely new objects.

Another problem is that when cloning a repository, the user gets a full copy of all of its history by default. That's desirable, he said, for normal repositories; but for large monorepos with binary files, "you probably don't want to download hundreds of gigabytes of data". In addition, there is no support for resuming a cloning operation: if it fails, the user has to start over. "So if you have downloaded 400GB out of a 500GB repository and your network disconnects, then you will have to redownload everything."

Code forges also struggle with large files. Users can resort to partial clones to avoid downloading an entire repository, but forges do not have that luxury. The consequence of that is significant storage costs. He said that an analysis of GitLab's hosted repositories has shown that 75% of the site's storage space is consumed by binary files larger than 1MB. Huge repository sizes also cause repository maintenance to become computationally expensive. Other types of web sites might offload large files to content-delivery networks (CDNs), but that is not an option for Git forges, he said. "All data needs to be served by the Git server, and that makes it become a significant bottleneck." Large objects are a significant cost factor for any large Git provider.

Git LFS and partial clones can help users, but those are just band-aids, Steinhardt said. Even though partial clones have been a feature in Git for quite a while, "I bet many of you have never used them before". And even when users do use partial clones, servers still cannot offload the files to a CDN.

The solution is large-object promisors, a remote that is used only to store large blobs and is separate from the main remote that stores other Git objects and the rest of the repository. The functionality is now built directly into Git, and is transparent to the client, he said.

In addition, large-object promisors could be served over protocols other than HTTPS and SSH. That would allow, for example, serving large objects via the S3 API. "This allows us to offload objects to a CDN and store large blobs in a format that is much better suited for them".

Even with promisors, though, Steinhardt said that Git still does not handle binary files efficiently on the client side. "This is where pluggable object databases come into play, which will allow us to introduce a new storage format for a large binary file specifically." Git needs a format designed for binaries, he said, where incremental changes to a binary file only lead to a small storage increase. It needs to be efficient for any file size.

In addition, a new format would need to be compatible with Git's existing storage format so that users could mix and match the old format for text files and use the new format for large binaries. Git's storage format is "deeply baked in" he said, but alternate implementations like libgit2 and go-git already have pluggable storage backends. "So there is no fundamental reason why Git can't do this too. It requires a lot of plumbing and refactoring, but it's certainly a feasible thing."

The two efforts to handle large objects, promisors and pluggable object databases, are progressing in parallel. The promisors effort is farther along, with the initial protocol implementation shipped in Git 2.50, and additional features in Git 2.52, both released in 2025. He said that it is quite close to being usable on the client side, though when support for promisors will arrive in Git forges is still undetermined.

The pluggable object database work is not that far along, he said. Over the past few Git releases the project has spent significant time refactoring how Git accesses objects. In 2.53, which was released a few days after his talk, Git shipped a unified object-database interface that will make it easier to change the format in the future. He said that he expected a proof of concept in Git 2.54, though implementing a viable format for binary files "will probably take a little bit longer".

User-interface improvements

One area of Git that tends to draw plenty of complaints is its user interface, he said. Many of Git's commands are extremely confusing, and some workflows "are significantly harder than they have any right to be". Recently, Git has had competition in the form of the Jujutsu version-control project that has made the Git project take a hard look at what it is doing. (LWN covered Jujutsu in January 2024.)

Jujutsu is a Git-compatible, Rust-based project started by Martin von Zweigbergk. It has a growing community and Steinhardt said that "many people seem to prefer the Jujutsu experience way more" than using Git. That is not much of a surprise, he said; Git's user interface has grown organically over two decades. It has "inconsistencies and commands that just don't feel modern". On the other hand, Jujutsu started from scratch and learned from Git's mistakes.

Early on, Steinhardt said he had looked at Jujutsu and found it confusing. "It just didn't make sense to me at all, so I simply discarded it." However, after noticing that there was a steady influx of people who did like it, he opted for another look. That time, something clicked. "That moment when you realize that a tool simply fixes all the UI issues that you had and that you have been developing for the last 20 years was not exactly great." He had two options: despair or learn from the competition. He chose to learn from it.

There are a number of things that Jujutsu got right, he said. For example, history is malleable by default. "It's almost as if you were permanently in an interactive rebase mode, but without all the confusing parts." When history is rewritten in Jujutsu all dependents update automatically "so if you added a commit, all children are rebased automatically". Conflicts are data, not emergencies. "You can commit them and resolve them at any later point in time." These features are nice to have, he said, and fundamentally change how users think about commits. "You stop treating them as precious artifacts and rather start treating them as drafts that you can freely edit".

But, he said, Git is old: the project cannot simply completely revamp its UI and break users' workflows. There are some things that Git can steal from Jujutsu, though. He discussed the workflow for splitting a Git commit, which involves seven separate commands with Git's current UI. Most users do not know how to do this, he said. The goal is to add several "opinionated subcommands" that make more modern styles of working with merge requests, such as stacked branches, much easier.

This includes two new commands, planned for Git 2.54, "git history split" and "git history reword". Future releases will have more history-editing subcommands and learn more from Jujutsu.

Steinhardt did not have time for questions; he closed the talk by saying that it had been a "whirlwind tour" through what is cooking in Git right now, and hoped that it had provided a clear picture of what the project was up to.

The video for the talk is now available on the FOSDEM 2026 web site. Slides have not yet been published.

[I would like to thank the Linux Foundation, LWN's travel sponsor, for funding my travel to Brussels to attend FOSDEM.]


Index entries for this article
ConferenceFOSDEM/2026