The new security frontier for LLMs; SIEM evasion

Daniel Knight, CEO at Vulnetic

In previous articles, I have put forth examples of how our agent performs on Active Directory and internal Linux networks. These engagements demonstrated strong offensive capability, but on a properly defended network with SIEM and EDR, many of these techniques would generate alerts. Starting in September, frontier LLMs became sophisticated enough to fingerprint and evade WAF, but internal SIEM evasion remained out of reach. Honeypots were also effective against models, with mixed abilities to identify them. With the latest model generation, that’s changed.

With the recent batch of models released in early February, it seems that SIEM evasion is within reach. Inside of our harness, we have been testing against Wazuh to evaluate and tune how our agent performs against defended networks. One thing we look for is the type of alert. A level 5 write file alert is not interesting whereas a level 9 SUID binary execution by non-root violates our requirements and requires further tuning on the model. One thing to note is that this is an initial capability. We think it will take another model iteration to be truly effective at evading EDR / SIEM systems. Our goal is to bridge the gap as much as possible until then.

Our Agent’s Approach

We can establish a basic plan for how models should engage with targets using human techniques. For example, there is a general process for fingerprinting Wazuh. Wazuh deploys an XML configuration file (/var/ossec/etc/ossec.conf) on every agent, defining which directories, logs, and commands are monitored. One of the first risk calculations for deploying RATs and persistence is where Wazuh isn't looking, but also how risky deployment is. If our agent thinks it's too risky to deploy a RAT, it will either look for other forms of persistence or should simply pass on the target.

The agent operates with zero forensic footprint by disabling history, reading defensive configurations first, and working entirely in memory. It avoids all known offensive tooling in favor of custom disposable scripts, moves laterally through credential reuse and native trust relationships, and spaces all actions apart to blend with normal activity patterns. One important thing to note is we do not use C2 frameworks. The agent builds beacons on the fly to best match the environment. With earlier models this produced inconsistent results, but with the newest frontier LLMs it is quite effective. For human in the loop work, we have a C2 framework that can be shipped with the agent.

The Environment

The lab consists of 8 targets across 4 network segments (DMZ, Internal, Secure, Windows VLAN) and the custom Wazuh ruleset contains ~50 rules (IDs 100001–100201) covering privilege escalation detection across SUID/SGID abuse, sudo exploitation, capability manipulation, cron tampering, PATH hijacking, passwd/shadow access, LD_PRELOAD injection, systemd persistence, container escape, kernel exploits, SSH key manipulation, log tampering, and Windows-simulated techniques (Kerberos, GPO, IIS, unquoted service paths, registry access). Alert levels range from 5 (informational like a file write) to 15 (critical like shadow file access, log tampering, and kernel exploits), with most privesc techniques sitting at level 10–12.

We acknowledge that this is a small network, but the test is whether the agent can evade Wazuh, not if it can attack large networks, which it can do already.

The Results

Our agent performed well for the first half of the engagement, compromising and achieving root on 5 of the 8 machines, as well as establishing stealthy persistence.

Get Daniel Knight’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Below is an excerpt of the general workflow the agent would take:

Fingerprint the SIEM on the target (as best it can as a low privilege user):