We used AI agents to reverse engineer Windows kernel drivers to find zero-days. It worked better than expected. Which is bad.
(This blog post was written by hand, and edited with AI assistance.)
Frontier LLMs have already proven themselves for security audits of open source projects. AI-powered researchers found tens (maybe hundreds?) of critical CVEs human researchers missed for years.
Eyal Kraft and I had an even bigger concern: what about all those targets no one has ever looked at? There are literally TBs of binaries running on millions of machines across the world no human researcher has ever bothered to look at. Most likely no one ever will.
We decided to put agent swarms to the test by building a simple harness to perform binary zero-day research at scale. We targeted Windows Kernel Drivers. Thousands of third-party .sys files ship on every OEM machine, each one cryptographically signed by both its vendor and MSFT, with questionable code quality.
To date, most security research and mitigation efforts around Windows drivers focused on intentionally malicious or rootkit-style drivers, binaries that expose unsafe APIs like MmMapIoSpace or raw MSR/port I/O as a deliberate backdoor. Microsoft’s vulnerable driver blocklist and community projects like LOLDrivers catalog these known-bad drivers and prevent them from being loaded. But this misses an entire class of vulnerabilities: unintentional memory corruption bugs inside otherwise legitimate drivers. These aren’t backdoors, but rather your garden-variety bugs written by legitimate vendors. No blocklist catches them because no one has ever audited their code, and preventing them from loading will prevent you from using your GPU, keyboard, or webcam.
To do this in a cost-effective manner, we built an autonomous platform that scrapes drivers from all over the internet, catalogs and labels them, decompiles them, and uses agent swarms to identify memory corruption vulnerabilities with minimal token use and maximum plain-old-python. By using SLMs for much of the analysis, the entire project cost us only $600 USD, roughly $3 per analyzed target.
From a dataset of over 1,873 binaries, we found 521 potential vulnerabilities across 158 unique driver binaries from dozens of vendors. Of those, we manually confirmed and reported 15 to vendors including Lenovo, Fujitsu, IBM, Intel, AMD, Silicom, NVIDIA, and Dell. They were, unsurprisingly, unresponsive. Despite most confirming the vulnerability exists (screenshots and/or video proof was always provided), to date, only one vulnerability was patched and assigned a CVE (CVE-2025-65001, we’d like to thank Fujitsu PSIRT for their handling of our submission).
It’s also important to note none of the PSIRTs saw it as their responsibility to alert MSFT of these vulnerable drivers to add them to the vulnerable driver blocklist or revoke their certificate.
After 90+ days of responsibly waiting for a fix on all submissions, we are publishing the full set of analyzed driver hashes so defenders can check whether affected binaries are present in their environments. If you’re a vendor, customer, or security professional interested in more details, you’re welcome to contact us at hi@hexaplex.ai.
Our flow runs a five-stage pipeline:
Scrape: We covered both the MSFT update catalog, OEM sites, and public driver repositories. Overall, we collected 1,654 different drivers across 1,873 unique binary versions.
Preprocess: CAB extraction, PE metadata analysis, and catalog signature parsing. This stage computes hashes, identifies driver entry points, and ranks targets by attack surface indicators (IOCTL dispatch complexity, number of device objects, presence of
METHOD_NEITHERhandlers). To focus our analysis, we filtered out drivers requiring complex setups (mostly non USB/PCIe devices) and old versions (Sorry, WinXP users...).
Analyze: The core loop. For each target, we launch an audit harness (today, we’d probably just use ClaudeCode / OpenClaw with custom skills). A council of LLM agents then iteratively audit the binary:
Decompilation Agent renames unnamed functions, deduces functionality, recovers dynamic calls, using contextual inference so auditors can follow the logic
Attack Surface Agent identifies functions worth auditing based on the decompiled code
Code Audit Agent inspects each target function for memory corruption bugs, walking the recovered call graph to understand data flow
Findings are written as structured JSON with bug type, severity, confidence, impact assessment, and the decompiled code path.
We used a mix of models via OpenRouter, optimizing for vulnerabilities per token rather than per-model accuracy. On average, each target costs roughly $3 in API calls.
Virtualize: (This became the bottleneck once we had a queue of 100+ findings). We created a custom VM-based harness for loading drivers on kernel-debugged Windows machines controlled by agents. Some drivers require custom USB/PCIe devices we obviously don’t have on hand. We customized QEMU to expose virtual devices, and used LLMs to virtualize enough of the initialization handshake for drivers to load and expose their IOCTL interfaces.
Validate: Using our harness, we can iteratively create Python PoC scripts per finding, effectively performing guided fuzzing until the machine crashes. The BSOD crash dump is then analyzed to confirm the vulnerability indeed triggered correctly (we found many hallucinated reports where the fuzzer successfully caused a crash, without actually exploiting the finding - causing a sneaky false positive we had to debug manually).
Report: We manually validated the automated report, running our PoC script on a “real” Windows 11 machine and ensuring the vulnerability description is comprehensive and factually correct. We then submitted it to the PSIRT ourselves. Note that we did not actually weaponize most vulnerabilities into full LPE exploits, but instead gauged the likelihood of it being exploitable in real-world scenarios based on our experience.
Of the 1,654 drivers in our dataset, we selected 202 high-risk drivers for full analysis based on preprocessing heuristics. The remaining drivers are queued for future passes.
Total binaries collected: 1,873
Total unique drivers: 1,654
Binaries analyzed: 202
Total findings: 521
Unique binaries with findings: 158
Findings manually confirmed and reported: 15
Vendors notified: 8
CVEs assigned: 1 (CVE-2025-65001)
Total project cost: ~$600
Cost per target: ~$3
Cost per bug: ~$4
Arbitrary Read: 144 (27.6%)
Heap Overflow: 111 (21.3%)
Other: 82 (15.7%)
Integer Overflow: 33 (6.3%)
Stack Overflow: 26 (5.0%)
Arbitrary Write: 92 (17.7%)
Use-After-Free: 22 (4.2%)
Type Confusion: 11 (2.1%)
Arbitrary memory access bugs (read + write) account for 45.3% of all findings, unsurprising given that IOCTL handlers routinely copy data between user and kernel buffers with no bounds checking. Most of those stem from either heap buffer mishandling or faulty deserialization logic.
Critical: 149 (28.6%)
High: 220 (42.2%)
Medium: 149 (28.6%)
Low: 3 (0.6%)
The agent provided severity and confidence ratings. 70.8% of findings are rated High or Critical, with 78.7% of findings having High confidence, 19.6% Medium, and only 1.7% Low.
After manual analysis, we estimate the false positive rate at approximately 60% of critical/high confidence findings. Most are real code patterns where the bug exists but exploitation is impractical (e.g., an OOB read that can only leak padding bytes, or a write that requires special system conditions). Adjusting for this, we estimate over 100 of the 521 findings represent genuinely exploitable user→kernel local privilege escalation on current Windows 11 x64 systems.
To reduce hallucination rate, the agent was asked to propose maximum impact of vulnerability exploitation for each reported finding. The most common impact categories across findings (categories overlap, a single finding can cause both DoS and privilege escalation):
Denial of Service: System crash via kernel bugcheck (BSOD). The dominant impact across nearly all findings, since any kernel memory corruption is at minimum a reliability issue.
Information Disclosure: Kernel memory leak via uninitialized buffers or OOB reads, often sufficient for KASLR bypass.
Privilege Escalation: Arbitrary kernel read/write primitives enabling token manipulation or kernel code execution.
Code Execution: Heap overflow or use-after-free conditions exploitable for arbitrary kernel code execution via pool shaping.
As an example, AMD’s Crash Defender driver (amdfendr.sys) exposes a world-writable device that supports sending IOCTLs with proprietary operation codes. Those operations on the device access internal transport queue descriptors without proper size validations, allowing heap corruption. With pool grooming, this is a path to arbitrary kernel data access, or even kernel code execution. Even without it, it’s a reliable BSOD from any user account.
The driver ships on Windows AMD systems, including AWS EC2 Windows AMIs with AMD instances. meaning the attack surface extends to cloud workloads.
The same common binary exploitation patterns account for the majority of findings. It is unclear whether it is because driver dev teams skip modern security tooling entirely, or because the highly polymorphic nature of both WDK and KMDF makes pre-LLM static analysis unfeasible.
We manually confirmed and reported 15 vulnerabilities to 8 vendors, all with CVSS ratings of 7+. All have been under disclosure for over 90 days.
Average CVSS across reported findings: 8.2 (High)
Most bugs are exploitable from a standard, unprivileged user account. Most require nothing more than opening the device handle (often world-accessible) and sending a sequence of DeviceIoControl calls. Windows drivers are MSFT-signed, meaning they can be side-loaded on any Windows machine via Bring Your Own Vulnerable Driver (BYOVD) attacks even if the associated product was never installed (requiring admin privileges and therefore making the LPE a more restrictive admin→kernel).
Response from vendor PSIRTs has been disappointing. Several vendors rejected our reports outright despite video proof-of-concept demonstrations of exploitation. Others acknowledged that the products containing these drivers have reached End-of-Life, but have not revoked the driver signing certificates, leaving the BYOVD attack surface intact on every Windows machine.
To date, only Fujitsu PSIRT has successfully patched and assigned a CVE (CVE-2025-65001). We continue to monitor and will update if additional CVEs are assigned.
Agent-assisted binary vulnerability research works, and it’s cheap. $600 and a few weeks produced an agentic loop that would take a human team years of focused reverse engineering. The per-bug cost of $4 means any motivated attacker can afford to scan the entire Windows driver ecosystem.
Agentic flows require closed loops. Although we performed this research before Opus-4.6 and GPT-5.3, the biggest performance leap was achieved by “closing the loop” and giving the agent direct feedback on exploitation success using our VM-based kernel-debugging harness. Agents that can try to bugcheck the machine over-and-over again are tomorrow’s fuzzers, and with enough compute they’re 100x more dangerous in the wrong hands.
The Windows kernel driver ecosystem is in worse shape than most people assume. Third-party kernel drivers remain one of the last bastions of C code running in kernel-mode with minimal security review. Code signing guarantees authenticity, not security. Our 60% false positive rate still leaves over 100 likely-exploitable privilege escalation bugs across mainstream vendor drivers from companies like AMD, Intel, NVIDIA, Dell, Lenovo, and IBM.
PSIRT processes are not built for this volume. Most vendor security teams are structured to handle a handful of reports per quarter from human researchers. When an automated system produces dozens of valid findings across multiple product lines simultaneously, the existing intake processes fail. Reports get rejected, misrouted, or deprioritized. The gap between discovery rate and remediation rate will only widen as these tools improve.
We’re publishing our hash list for customers to defend themselves. We’re past responsible disclosure timelines, so we’re making IOCs publicly available in a safe manner. If you’re a PSIRT, CERT, or security team that needs more information, contact us at hi@hexaplex.ai and we’ll figure it out.
The hashes below are “double-hashed”, each value is SHA256(SHA256(file_bytes)). We publish them this way so attackers can’t use this list to look up and download vulnerable binaries from VirusTotal or similar services. If you have a driver file on disk, you can check it against this list locally. We will provide comprehensive IOCs directly to PSIRTs, CERTs, and security teams upon request. Contact us at hi@hexaplex.ai.
To check a driver on your system:
Full list:

