The new security frontier for LLMs; SIEM evasion

6 min read Original article ↗

Daniel Knight

Daniel Knight, CEO at Vulnetic

In previous articles, I have put forth examples of how our agent performs on Active Directory and internal Linux networks. These engagements demonstrated strong offensive capability, but on a properly defended network with SIEM and EDR, many of these techniques would generate alerts. Starting in September, frontier LLMs became sophisticated enough to fingerprint and evade WAF, but internal SIEM evasion remained out of reach. Honeypots were also effective against models, with mixed abilities to identify them. With the latest model generation, that’s changed.

With the recent batch of models released in early February, it seems that SIEM evasion is within reach. Inside of our harness, we have been testing against Wazuh to evaluate and tune how our agent performs against defended networks. One thing we look for is the type of alert. A level 5 write file alert is not interesting whereas a level 9 SUID binary execution by non-root violates our requirements and requires further tuning on the model. One thing to note is that this is an initial capability. We think it will take another model iteration to be truly effective at evading EDR / SIEM systems. Our goal is to bridge the gap as much as possible until then.

Our Agent’s Approach

We can establish a basic plan for how models should engage with targets using human techniques. For example, there is a general process for fingerprinting Wazuh. Wazuh deploys an XML configuration file (/var/ossec/etc/ossec.conf) on every agent, defining which directories, logs, and commands are monitored. One of the first risk calculations for deploying RATs and persistence is where Wazuh isn't looking, but also how risky deployment is. If our agent thinks it's too risky to deploy a RAT, it will either look for other forms of persistence or should simply pass on the target.

The agent operates with zero forensic footprint by disabling history, reading defensive configurations first, and working entirely in memory. It avoids all known offensive tooling in favor of custom disposable scripts, moves laterally through credential reuse and native trust relationships, and spaces all actions apart to blend with normal activity patterns. One important thing to note is we do not use C2 frameworks. The agent builds beacons on the fly to best match the environment. With earlier models this produced inconsistent results, but with the newest frontier LLMs it is quite effective. For human in the loop work, we have a C2 framework that can be shipped with the agent.

The Environment

The lab consists of 8 targets across 4 network segments (DMZ, Internal, Secure, Windows VLAN) and the custom Wazuh ruleset contains ~50 rules (IDs 100001–100201) covering privilege escalation detection across SUID/SGID abuse, sudo exploitation, capability manipulation, cron tampering, PATH hijacking, passwd/shadow access, LD_PRELOAD injection, systemd persistence, container escape, kernel exploits, SSH key manipulation, log tampering, and Windows-simulated techniques (Kerberos, GPO, IIS, unquoted service paths, registry access). Alert levels range from 5 (informational like a file write) to 15 (critical like shadow file access, log tampering, and kernel exploits), with most privesc techniques sitting at level 10–12.

We acknowledge that this is a small network, but the test is whether the agent can evade Wazuh, not if it can attack large networks, which it can do already.

The Results

Our agent performed well for the first half of the engagement, compromising and achieving root on 5 of the 8 machines, as well as establishing stealthy persistence.

Get Daniel Knight’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Below is an excerpt of the general workflow the agent would take:

  1. Fingerprint the SIEM on the target (as best it can as a low privilege user):

Press enter or click to view image in full size

Agent analyzing Wazuh configuration for blindspots

2. Enumerate and exploit the privilege escalation vector without compromising security:

Press enter or click to view image in full size

Agent using gcc on the target to compile malicious libraries to get root

3. Once it has root, perform further enumeration now that we have access to new files:

Press enter or click to view image in full size

Agent fingerprinting Wazuh once establishing root

4. Establish persistence without being detected:

Press enter or click to view image in full size

Deep reasoning by the agent on the best options

5. The Beacon itself (Plan, Establish, Cleanup, Test):

Press enter or click to view image in full size

Agent planning on how best to deploy the beacon without being detected

Press enter or click to view image in full size

Agent creates and establishes a beacon on the target

Press enter or click to view image in full size

Test beacon to ensure it works

Press enter or click to view image in full size

Agent restoring library to cover up evidence

This represents a layer of risk calculation in combination with analysis not available in previous models. Our agent performed these same actions on 4 other Linux devices on the network, achieving root on all of them and pivoting between multiple subnets.

Once venturing into the Windows VLAN however, we ran into issues where the agent would willingly take risky actions that would flag detection mechanisms.

Press enter or click to view image in full size

This particular sequence demonstrates a lack of fingerprinting SIEM before loud enumeration

Above we can see the agent performing all the commands at once, negating the point of fingerprinting in the first place.

Press enter or click to view image in full size

Reasoning about privilege escalation on windows, negating the concern of Wazuh monitoring db.conf

The end result is that the agent’s evasion capabilities degrade the further it ventures into the network. On the Linux machines it read the Wazuh syscheck config, evaluated persistence paths against it, and made deliberate tradeoffs. On win02, none of that happened. Wazuh was noted as a finding and immediately ignored as a constraint. The most likely explanation is that the agent context is being diluted deeper into engagements, causing it to follow rules less strictly. We are working on strategies to enhance the stickiness of instructions deeper into engagements.

The Future

If models are capable of SIEM evasion, organizations need to assume adversaries will have access to these capabilities soon. It will take further improvements in model instruction-following consistency to achieve this completely, but we are close. Context enhancements will improve the long running rule following required to perform proper stealthy attacks on networks. This is much trickier than standard web pentesting, so it is no surprise the models are only now crossing the threshold of practical usability for this attack class.

In OpenAI’s benchmarks, their raw models are incapable of EDR evasion, so it takes a special harness to be able to do this.

Checkout our hacking agent here

Follow me on X here