LWN.net needs you!Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing.
Predictions that LLM tools would cause a surge in reports of security vulnerabilities have, unquestionably, borne out. As expected, maintainers are having to wade through more security reports than ever before; in addition, LLM tools are disrupting traditional-coordinated disclosure practices as well. The method of Copy Fail's disclosure, in particular, left vendors, projects, and users scrambling. In addition, maintainers are seeing parallel discovery of the same security flaws within the embargo window. Both of these developments mean that coordinated security disclosures may become a thing of the past.
"Mining security gold"
Jeremy Stanley, a member of the vulnerability management
team for the OpenStack
cloud-computing project, brought this
topic to the OSS Security mailing list on April 28. He said that
projects he worked on "are under a seemingly unending deluge of reports from
researchers using LLMs to mine for security gold in our software
". That had
led to him thinking about the risks that public LLM services might pose to
traditional vulnerability-handling workflows, embargoes, and coordinated
disclosure of security flaws. He wondered if it would be helpful to keep
embargoes short to mitigate the risks of parallel discovery or disclosure to
other LLM users.
I'm sorely tempted, both due to the increased volume and the risk of premature disclosure, to just assume that any vulnerability reported as a result of research using an LLM is trivially discoverable by others, and give up trying to pretend there's any point to working it under embargo. Similarly, it makes sense to me that patch development and descriptive prose shouldn't be produced with LLM assistance for any vulnerability that is being worked under an embargo.
He said that he could not be the only one thinking about this topic, and
asked what others thought. Jacob Bachmeyer replied
that parallel discovery was a big risk: "If an LLM can find a bug for a
whitehat, it can do the same for a blackhat. [...] LLM-discovered
vulnerabilities should be considered already publicly known
".
Effect on embargoes
If the vulnerabilities are (arguably) known to multiple people, the question
arises of whether standard practices around security embargoes still make sense.
Lucas Holt said
that he saw "the logic and temptation
" of shortening embargoes, but dropping
a zero-day exploit on a small project would not help anyone. He suggested that
large projects may have multiple people looking for and discovering the same
flaws, but smaller projects were unlikely to have multiple people probing them
for vulnerabilities at the same time.
Stanley clarified that
he was approaching the problem from the perspective of "an upstream
maintainer and vulnerability coordinator of large/popular projects receiving
these reports
". He was seeing a flood of reports, and that trying to manage
all of them in private was leading to "accidents breaking our embargoes
before things are ready or distros/deployers have been given sufficient advance
warning
". If security reports were public immediately, he thought projects could
crowdsource help from a larger portion of their community "rather than
relying solely on overwhelmed vulnerability coordinators and security-focused
maintainers
".
Holt had suggested that people who used LLMs to find security bugs could also
use their tools to create patches. Brian May warned
people to be careful of that approach. "Simple patches that look good can in
fact be hiding serious security issues. Thinking of the September 2006 Debian openssl
issue here.
"
Greg Dahlman also
thought that LLMs were unreliable when it comes to creating security
fixes. He described the ability of current LLMs to produce correct and
secure code as being within "the coin flip range
"; in other words, he
thought there was only a 50-50 chance that an LLM's suggested solution to fix a
flaw would be adequate. Therefore, any embargo timeline needed to take into
account the asymmetry between the time it takes to discover a flaw and the time
needed to actually fix it.
Already happening
It turns out that Stanley's hypothesis about parallel discovery had already been
proven in the wild. Clemens Lang, who works on the Red Hat Enterprise Linux
(RHEL) crypto team, provided
a data point to support the theory: "We're seeing duplicate reports of
the same issue found by multiple independent groups that use LLMs, within the
embargo period.
" Greg Kroah-Hartman also
reported that kernel developers were "seeing duplicate reports of
the same issue from different groups within the time period it takes to get a
fix merged
".
Willy Tarreau said that he had predicted the death of security embargoes months ago:
Embargoes now play against security, for all the time we don't act, users stay exposed to anyone having the luck to find the same problem. It's not a matter of the LLM's strength but a matter of determination by the researcher who could simply run a small model several times helping it dig further. Bigger models just find faster, but that only counts for those seeking protection, not for those trying to attack.
Copy Fail disclosure fail
The Copy Fail privilege-escalation vulnerability (CVE-2026-31431) was announced on
April 29 by Xint. It is a "straight-line logic flaw
", meaning that
it does not require race conditions or other special circumstances, that allowed
local users to trivially get root access on most Linux distributions unless they
had the most up-to-date kernels.
Along with the announcement, there was a proof-of-concept (POC) Python script that allowed users to see if their systems were vulnerable. The fix had already been included with the 7.0, 6.19.12, and 6.18.22 kernels, but was not available as a backport to older stable kernels until April 30. Most Linux distributions were caught with their proverbial pants down. On April 30, with a POC showing how to exploit the vulnerability widely available, major distributions such as Debian, Red Hat Enterprise Linux, SUSE, and Ubuntu had no fix ready for their users.
It was described
as "one of the worst make-me-root vulnerabilities in the kernel in recent
times
" by Eddie Chapman on the OSS Security list. He wondered what went wrong:
Has the embargo been broken early today? Not looking to point any fingers, those who make things happen in our communities work dam hard and deserve respect and support, especially with the extra burden of AI slop now.
Gentoo contributor Sam James said that
it's up to security reporters to notify Linux distributions of kernel
vulnerabilities: "unless the reporter chooses to bring it to the
linux-distros [mailing list], there is no heads-up to distributions. It did not
happen here.
"
The discussion turned to something of an ad-hoc postmortem, fingerpointing, and problem-solving session. Alexander Peslyak, who goes by "Solar Designer", said that distributions had little way of knowing about the importance of this vulnerability, as it did not stand out among all the others.
The vulnerability was reported to the kernel team on March 23,
according to the timeline in Xint's announcement. A patch was committed to the
mainline kernel on April 1. CVE-2026-31431
was added to the kernel's security repository on April 25, with a
Common Vulnerability Scoring System (CVSS) score of 7.8 (out of a possible
ten), four days before the Copy Fail announcement. Peslyak pointed out that
there were 168 CVEs in the batch from that day, with scores between 7.1 and
9.8. "By the score alone, this one really does not stand out. To me, this is
usual noise, with little signal in there.
" It did not hint at the severity
or imminent threat of a POC being released. In fact, there are 21 CVEs added on
April 25 that have a "9.8 CRITICAL" CVSS score.
Greg Kroah-Hartman replied
that the kernel team's "constant message
", for decades, has been that
users must upgrade to the latest release to ensure that they have all of the
fixes for currently known issues. He added that the kernel team had no knowledge
of the Copy Fail announcement ahead of time "as no one is obligated to tell
us that they are about to let loose a trivial exploit
", and that the team
was not allowed to notify others ahead of time, lest they have to tell everyone
about everything. "That's the only policy by which all the legal/governmental
agencies have agreed to allow us to operate in, so we are stuck with
it.
"
On May 3, James shared a
link to a
comment from Brian Pak of Xint that said the company had provided a fully
working exploit to the kernel security team when the vulnerability was
reported. "We've since learned that such details don't automatically get
forwarded downstream and that Linux kernel commit messages are typically kept
minimal. That's simply how the process works.
" James said that the kernel
team "were very much aware of the impact from the offset
", and inquired if
the kernel team was "honestly proud of how this went
".
The CVE garbage patch
Kroah-Hartman asked
exactly what James would suggest that the kernel team do better, and how it should
do it. He added that the team receives bug reports of local-user privilege-escalation
bugs all the time; it was not obvious that this one was special until
after the fact because the submitter used it to show off their
software. "That is something that normally does not happen and is outside of
the control of all of us involved here.
" Emily Shepherd wanted
to know if the POC or a description of the flaw's severity had been provided;
Kroah-Hartman replied
that he honestly did not remember because "that was months and hundreds, if not
thousands, of reports ago
". He also reminded the list how the kernel security
team operates:
The job of the kernel security team is to triage a bug report, drag in the relevant maintainer/developer, get the issue fixed and merged into Linus's tree as soon as possible. Once it lands in Linus's tree, our role is over.
We do not do "announcements" of anything to anyone, so even if this was a "look how bad you can abuse the system" type of thing, we would not be telling anyone anything.
He added that he's documented this in detail, given many talks about it, and has been blogging about it as well.
On social media, Josh Bressers suggested
that the blame lies with the company: "every AI vulnerability company wants
to find something juicy, and have no idea how to coordinate the
findings
". Kroah-Hartman dismissed
the idea that coordination of vulnerabilities was even possible
now. Bressers agreed:
"I do think you're right that the traditional disclosure model is gone
forever
". He did think it was "pretty obvious
" that Copy Fail would
be a big one; but he had no idea what to do to prevent important vulnerabilities
from drowning in the "great CVE garbage patch
".
Copy Fail is just one vulnerability out of thousands. At the rate that LLM tools seem to be speeding up discovery, it does seem that Kroah-Hartman has a point. Traditional disclosure notification is becoming increasingly difficult, if not outright impossible. The volume of reports, coupled with the fact that many AI-assisted reporters will be unfamiliar with how security disclosure usually works, means that relying on embargoes and coordinated fixes is going to be increasingly risky. We live in interesting times for open-source security, whether we want to or not.