GitHub - expanso-io/log-simulators: Realistic log generators for testing data pipelines at volume - web, IoT, syslog, Windows, Cisco ASA, CEF/LEEF, JSON app, cloud audit, Kubernetes, PostgreSQL. Requires only uv.

4 min read Original article ↗

Realistic log generators for testing data pipelines at volume. Ten simulators covering the device types that matter for SIEM and observability pipelines — each one a single command that needs only uv.

uvx --from git+https://github.com/expanso-io/log-simulators logsim-web --rate 100

No clone, no install, no Docker. Pipe the output anywhere — a file, a TCP/UDP collector, or straight into an Expanso Edge pipeline.

The simulators

Tool Generates Demo scenario
logsim-web Apache/nginx access + error logs (NCSA combined/common/JSON), session-coherent visitors error-storm — recurring 5xx spikes
logsim-iot IoT sensor telemetry NDJSON: temperature, humidity, pressure, vibration, voltage with drift + diurnal cycles sensor-fault — spikes, stuck values, dropouts
logsim-syslog RFC 3164 and RFC 5424 syslog with realistic facility/severity mix auth-burst — failed-login floods
logsim-windows Windows Security Event XML (4624/4625/4688/4672) brute-force — 4625 password-spray bursts
logsim-asa Cisco ASA firewall syslog — paired build/teardown with consistent connection IDs, denies port-scan — deny storms from one source
logsim-cef CEF and LEEF security events (firewall/IPS style) malware-burst — high-severity event waves
logsim-app Structured JSON app logs with trace IDs and realistic embedded PII (for redaction demos) error-storm, pii-leak
logsim-cloud AWS CloudTrail JSON and VPC Flow Logs suspicious-login — off-region console logins
logsim-k8s Kubernetes CRI container logs — multi-pod node, klog + JSON apps, partial-line mechanics crash-loop — restarting pod
logsim-postgres PostgreSQL server logs incl. multiline ERROR/DETAIL/STATEMENT and slow queries deadlock — lock-contention windows
logsim-vmware VMware vSphere — vCenter (vpxd) task begin/finish + ESXi vmkernel/hostd/vobd, one correlated estate host-failure — ESXi host drops, vSphere HA restarts its VMs
logsim-ics Industrial/OT network-device syslog — Cisco-IOS-style %FAC-SEV-MNEMONIC from plant switches, PLC comms over PROFINET/MODBUS/DNP3/IEC-104 plc-comm-loss — a cell-area segment degrades and recovers
logsim-retail Retail point-of-sale transactions (CSV or JSON) — stable product catalog, Zipf best-sellers, recurring customers, per-region tax flash-sale — promoted SKUs surge in volume and discount

Every tool shares the same CLI contract:

--rate N            average events/sec (Poisson-paced, like real traffic)
--count N           stop after N events (0 = run forever)
--duration 5m       stop after a wall-clock duration
--backfill 24h      synthesize 24h of history at full speed, then exit
--follow            ...then keep streaming live
--start-time ISO    anchor the backfill window (deterministic with --seed)
--seed N            fully reproducible output
--diurnal           overnight trough, midday peak
--output DEST       '-' stdout (default) | file path | tcp://host:port | udp://host:port
--rotate-mb N       rotate + gzip file output
--scenario NAME     inject recurring anomaly windows (per-tool)

Quick start

# Stream Apache combined logs at 50/sec forever
uvx --from git+https://github.com/expanso-io/log-simulators logsim-web --rate 50

# 24 hours of historical IoT telemetry, then exit
uvx --from git+https://github.com/expanso-io/log-simulators logsim-iot --backfill 24h --output sensors.ndjson

# A brute-force attack inside normal Windows event noise, to a UDP collector
uvx --from git+https://github.com/expanso-io/log-simulators logsim-windows \
    --scenario brute-force --rate 20 --output udp://localhost:5514

# Reproducible test fixture: same command, byte-identical output
uvx --from git+https://github.com/expanso-io/log-simulators logsim-asa \
    --seed 42 --count 1000 --backfill 1h --start-time 2026-01-15T12:00:00+00:00

# Umbrella command works too
uvx --from git+https://github.com/expanso-io/log-simulators logsim k8s --rate 30

Single-file versions of the most-used tools live in standalone/ — each is a self-contained PEP 723 script:

uv run https://raw.githubusercontent.com/expanso-io/log-simulators/main/standalone/web_access_sim.py --rate 10

Why these formats

The May 2025 joint CISA/NSA/ACSC guidance, Priority logs for SIEM ingestion, names the sources practitioners should prioritize: OS logs, network devices, firewalls/IDS, and cloud audit trails — and explicitly recommends against shipping everything raw into the SIEM. This suite generates exactly those sources, so you can build and demo the filtering/routing layer in front of the SIEM with realistic volume, then prove zero-loss delivery (seeded, countable output) end to end.

What makes the output realistic rather than random:

  • Entity consistency — the same hosts, users, IPs, and devices recur coherently (a firewall's teardown matches its build; a session keeps its IP).
  • Skewed distributions — Zipf popularity for paths/IPs, long-tail response sizes, Poisson inter-arrival times.
  • Scenario injection — a baseline of boring traffic with deterministic, recurring anomaly windows you can catch in a pipeline.
  • Seeded determinism--seed + --start-time reproduce byte-identical streams for tests and fixtures.

Development

git clone https://github.com/expanso-io/log-simulators
cd log-simulators
uv sync            # installs everything incl. dev tools
uv run pytest      # full test suite
uv run ruff check . && uv run ruff format --check .
uv run logsim list # see all tools

The layout is a single distribution with one subpackage per simulator plus a shared core (src/log_simulators/core/) providing pacing, sinks, entity pools, and scenario scheduling. This keeps uvx --from git+... working verbatim — a multi-package workspace would not survive git installation (see uv issues #16328 / #10728).

Lineage

Aggregates and supersedes bacalhau-project/access-log-generator (now logsim-web), bacalhau-project/sensor-log-generator (now logsim-iot), the log generators from aronchick/sample-data (now logsim-windows, logsim-vmware, and logsim-ics), and the retail transaction generator from expanso-cluster (now logsim-retail). CLI ergonomics inspired by mingrammer/flog.

License

Apache-2.0