Benefits for LWN subscribersThe primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!
Git is ubiquitous; in the last two decades, the version-control system has truly achieved world domination. Almost every developer uses it and the vast majority of open-source projects are hosted in Git repositories. That does not mean, however, that it is perfect. Patrick Steinhardt used his main-track session at FOSDEM 2026 to discuss some of its shortcomings and how they are being addressed to prepare Git for the next decade.
Steinhardt said that he began to be interested in open-source software around 2002, when he was 11 years old. He bought his first book on programming when he was 12, and made his first contribution to an open-source project in 2011. He became a Git and libgit2 contributor in 2015, has been a backend engineer at GitLab since 2020, and became the manager of the Git team there in 2024.
Git must evolve
Git turned 20 last year; there are millions of Git repositories
and even more scripts depending on Git. "The success of Git is
indeed quite staggering.
" However, the world has changed
quite a bit since Git was first released in 2005; it was designed for
a different era. When Git was released, SHA-1 was considered to be a
secure hash function; that has changed, he said, with the SHAttered
attack that was announced in 2017 by Centrum Wiskunde & Informatica
(CWI) and Google. In 2005, the Linux kernel repository was considered
big; now it is dwarfed by Chromium
and other massive monorepos. Continuous-integration
(CI) pipelines were the exception, he said, in 2005—but now
projects have pipelines with lots of jobs that are kicked off every
time there's a new commit.
Also, Steinhardt said to general laughter: "Git was very hard to
use back then; but to be quite honest, Git's still hard to use
nowadays.
" So, the world has changed and Git needs to change with
it. But, he said, the unique position of Git means that it can't have
a revolution; too many projects and developers rely on it. Instead, it
needs to evolve, and he wanted to highlight some of the important
transitions that Git is going through.
SHA-256
The most user-visible change that Git is going through today, he
said, is the SHA-256 transition. SHA-1 is a central part of the
project's design; every single object stored in Git, such as files
(blobs), directory trees, and commits, has an identity
that is computed by hashing the contents of the object. Objects are content
addressable, "given the contents, you know the name of the
object
". That name, of course, is computed using the
no-longer-secure SHA-1.
The work by CWI and Google proved that attacks on SHA-1 are
viable. It requires a lot of compute, about 110 years
worth of single-GPU computations, but it is possible. He noted that
with all the hype around artificial intelligence, data centers have
greatly increased their GPU capacity. "It is very much in reach of
a large player to compute hash collisions
".
SHAttered kicked off quite a few conversations on the Git mailing
lists. During these conversations, he said, it has been asserted that
the use of SHA-1 is not primarily for security and a number of
arguments have been made to back that up. The SHA-1 object hash
is primarily used as an integrity check to detect transmission errors
or bit flips. Also, source code is transparent, "if you see a merge
request where somebody enters random collision data into your code,
then you might probably ask some questions
". Additionally, there
are other security measures such as GPG signatures, HTTPS transport,
and a web of trust among developers that means Git does not rely on
SHA-1 alone.
"But the reality is that things are a little bit more
complicated
", Steinhardt said. Git may not rely on SHA-1 for
security, but everyone else does. When developers sign a commit with
Git, for example, it is the SHA-1 hash that is signed. It might be
noticeable if source code is changed to cause a collision, but
binary blobs such as firmware are not human-readable, so there is no
way to easily see that there is a malicious file. Tooling around Git
also assumes collision resistance, so CI systems, scripts, and such all
trust the SHA-1 hash.
Finally, various governments and enterprise requirements have
mandated removal of SHA-1 by 2030, so Git needs to move on. And it
has: SHA-256 support was added in October 2020, with version 2.29. "But
nobody is using it
", Steinhardt said, because ecosystem support is
lacking. "Unfortunately, this situation looks somewhat grim
".
There is full support in Git itself, Dulwich Python implementation, and Forgejo collaboration platform. There is experimental support
for SHA-256 in GitLab, go-git, and
libgit2. Other popular Git tools
and forges, including GitHub, have no support for SHA-256 at all. That
creates a chicken-and-egg problem, he said. Nobody is moving to
SHA-256 because it is not supported by large forges, and large forges
are not implementing support because there's no demand.
The problem, Steinhardt said, is that we cannot wait forever. It
will become more and more feasible to break SHA-1, and the next
cryptographic weakness may be just around the corner. Even if there
were full support for SHA-256 today, projects still need time to
migrate. Git will make SHA-256 the default for newly created
repositories in 3.0, he said. The hope is to force forges and
third-party implementations to adapt. "The transition will likely
not be an easy one, and it may result in a few hiccups along the
road.
" When 3.0 will be released is still up in the air; a discussion
about its release date in October 2025 on the Git mailing list did not
result in a firm decision.
He said that the audience could help to move things along. "You
can show your favorite code forges that you care about SHA-256 so they
bump the priority.
" He also encouraged people to help by testing
SHA-256 with new projects and adding support to third-party tools
that depend on Git. "Together, we can hopefully get the ecosystem
to move before the next vulnerability
".
Reftables
Another significant shift for Git, which he declared his favorite topic for discussion, is the move to reftables. By default, Git stores references as "loose" references, where each is stored as a separate file such as "refs/heads/main". The format for these files is straightforward to understand, he noted, but storing every single reference as a file does not scale well. It is fine for a project with a handful of references, but if there are hundreds or thousands then it becomes really inefficient.
To deal with that inefficiency today, Git will create a packed-refs file; this can be done manually with "git pack-refs --all", but Git will also do it automatically. However, Steinhardt said, Git still needs to change the way it deals with references.
The first reason he gave is that "filesystems are simply
weird
". Many filesystems, for example, are case-insensitive by
default. That means that Git cannot have two branches whose names only differ
in case, as just one example. It is also an inefficient design, he
said: to create 20 different references, Git has to create 20
different files. That may not take long from a performance
perspective, but each reference requires 4KB of storage for typical
filesystems. That begins to add up quickly.
Packed references are computationally expensive, he said, which is
not a problem if a project only has a few references. "But,
Git users are not always reasonable.
" He said that GitLab hosts
one repository with about 20-million references; each time a reference
is deleted, the packed-refs file has to be completely
rewritten which means rewriting 2GB of data. "To add insult to
injury, this repository typically deletes references every couple
seconds.
"
The third problem Steinhardt described is that concurrency is an afterthought. It is impossible to get a consistent view of all references when there are multiple readers and writers in a repository at the same time. When a user writes to a repository while another user is reading the references, it is impossible to know if they are getting a consistent result or a mixture of the old and new state.
Those problems have been known for a long time, he said, and that is where the reftable backend comes into the picture. Users can create a new repository with a reftable today. The tables are now stored in a binary format rather than the text-based, which is more efficient—though it does mean that the files are no longer human-readable. The new data structure also allows Git to perform atomic updates when writing references to the reference table, and Git is no longer subject to filesystem limitations when it comes to naming references.
As with SHA-256, reftables will become the default in Git
3.0. "So if you use Git in scripts or on the server side, you
should make sure you don't play weird games by accessing references
directly on the filesystem
". Instead, Git users should always
access references with the git command.
Large files
Steinhardt said that, for most of the people in the room, the
scalability problems related to references were mostly theoretical and
rarely encountered in practice. When it comes to scalability
bottlenecks, "the more important problem tends to be large
files
". Storing large binary files in Git is, unfortunately, not a
use case that is well-supported today. There are third-party
workarounds, such as Git LFS and git-annex, but
the Git project would like to solve the problem directly.
Large files are a problem for Git because of the way that it compresses objects, he said. It works extremely well when working with text files, such as source code, because that is what Git was designed for. But Git's compression does not work well for binary files, and even small edits to such files means creating entirely new objects.
Another problem is that when cloning a repository, the user gets a
full copy of all of its history by default. That's desirable, he said,
for normal repositories; but for large monorepos with binary files,
"you probably don't want to download hundreds of gigabytes of
data
". In addition, there is no support for resuming a cloning
operation: if it fails, the user has to start over. "So if you have
downloaded 400GB out of a 500GB repository and your network
disconnects, then you will have to redownload everything.
"
Code forges also struggle with large files. Users can resort to
partial clones to avoid downloading an entire repository, but forges
do not have that luxury. The consequence of that is significant
storage costs. He said that an analysis of GitLab's hosted
repositories has shown that 75% of the site's storage space is
consumed by binary files larger than 1MB. Huge repository sizes also
cause repository maintenance to become computationally
expensive. Other types of web sites might offload large files to
content-delivery networks (CDNs), but that is not an option for Git
forges, he said. "All data needs to be served by the Git server,
and that makes it become a significant bottleneck.
" Large objects
are a significant cost factor for any large Git provider.
Git LFS and partial clones can help users, but those are just
band-aids, Steinhardt said. Even though partial clones have been a
feature in Git for quite a while, "I bet many of you have never
used them before
". And even when users do use partial clones,
servers still cannot offload the files to a CDN.
The solution is large-object promisors, a remote that is used only to store large blobs and is separate from the main remote that stores other Git objects and the rest of the repository. The functionality is now built directly into Git, and is transparent to the client, he said.
In addition, large-object promisors could be served over protocols
other than HTTPS and SSH. That would allow, for example, serving large
objects via the S3
API. "This allows us to offload objects to a CDN and store
large blobs in a format that is much better suited for them
".
Even with promisors, though, Steinhardt said that Git still does
not handle binary files efficiently on the client side. "This is
where pluggable object databases come into play, which will allow us
to introduce a new storage format for a large binary file
specifically.
" Git needs a format designed for binaries, he said,
where incremental changes to a binary file only lead to a small
storage increase. It needs to be efficient for any file size.
In addition, a new format would need to be compatible with Git's
existing storage format so that users could mix and match the old
format for text files and use the new format for large binaries. Git's
storage format is "deeply baked in
" he said, but alternate
implementations like libgit2 and go-git already have pluggable
storage backends. "So there is no fundamental reason why Git can't
do this too. It requires a lot of plumbing and refactoring, but it's
certainly a feasible thing.
"
The two efforts to handle large objects, promisors and pluggable object databases, are progressing in parallel. The promisors effort is farther along, with the initial protocol implementation shipped in Git 2.50, and additional features in Git 2.52, both released in 2025. He said that it is quite close to being usable on the client side, though when support for promisors will arrive in Git forges is still undetermined.
The pluggable object database work is not that far along, he
said. Over the past few Git releases the project has spent significant
time refactoring how Git accesses objects. In 2.53,
which was released a few days after his talk, Git shipped a unified
object-database interface that will make it easier to change the
format in the future. He said that he expected a proof of concept in
Git 2.54, though implementing a viable format for binary files
"will probably take a little bit longer
".
User-interface improvements
One area of Git that tends to draw plenty of complaints is its user
interface, he said. Many of Git's commands are extremely confusing,
and some workflows "are significantly harder than they have any
right to be
". Recently, Git has had competition in the form of the
Jujutsu version-control
project that has made the Git project take a hard look at what it is
doing. (LWN covered
Jujutsu in January 2024.)
Jujutsu is a Git-compatible, Rust-based project started by Martin von Zweigbergk. It has a growing
community and Steinhardt said that "many people seem to prefer the
Jujutsu experience way more
" than using Git. That is not much of a
surprise, he said; Git's user interface has grown organically over two
decades. It has "inconsistencies and commands that just don't feel
modern
". On the other hand, Jujutsu started from scratch and
learned from Git's mistakes.
Early on, Steinhardt said he had looked at Jujutsu and found it
confusing. "It just didn't make sense to me at all, so I simply
discarded it.
" However, after noticing that there was a steady
influx of people who did like it, he opted for another look. That
time, something clicked. "That moment when you realize that a tool
simply fixes all the UI issues that you had and that you have been
developing for the last 20 years was not exactly great.
" He had
two options: despair or learn from the competition. He chose to learn
from it.
There are a number of things that Jujutsu got right, he said. For
example, history is malleable by default. "It's almost as if you
were permanently in an interactive rebase mode, but without all the
confusing parts.
" When history is rewritten in Jujutsu all
dependents update automatically "so if you added a commit, all
children are rebased automatically
". Conflicts are data, not
emergencies. "You can commit them and resolve them at any later
point in time.
" These features are nice to have, he said, and
fundamentally change how users think about commits. "You stop
treating them as precious artifacts and rather start treating them as
drafts that you can freely edit
".
But, he said, Git is old: the project cannot simply completely
revamp its UI and break users' workflows. There are some things
that Git can steal from Jujutsu, though. He discussed the workflow for
splitting a Git commit, which involves seven separate commands with
Git's current UI. Most users do not know how to do this, he
said. The goal is to add several "opinionated subcommands
" that
make more modern styles of working with merge requests, such as stacked
branches, much easier.
This includes two new commands, planned for Git 2.54, "git history split" and "git history reword". Future releases will have more history-editing subcommands and learn more from Jujutsu.
Steinhardt did not have time for questions; he closed the talk by
saying that it had been a "whirlwind tour
" through what is
cooking in Git right now, and hoped that it had provided a clear
picture of what the project was up to.
The video for the talk is now available on the FOSDEM 2026 web site. Slides have not yet been published.
[I would like to thank the Linux Foundation, LWN's travel sponsor, for funding my travel to Brussels to attend FOSDEM.]
| Index entries for this article | |
|---|---|
| Conference | FOSDEM/2026 |