LLM Steganography: How AI Models Could Hide Messages in Plain Text

1 min read Original article ↗

Interactive Research Demo

Unicode Steganography

by Patrick Vuscan

How invisible characters and visual lookalikes can carry secret messages through ordinary text, and what that means for LLM safety.

A sufficiently capable model could embed covert signals in its outputs, invisible to human readers but recoverable by another model or process. Try it yourself below.

Why This Matters for AI Safety

LLM steganography is a concrete example of how AI deception could work in practice. Each of the techniques demonstrated above fools human readers, but none survives a scanner built to look for them: a Unicode category check catches zero-width characters, a homoglyph alphabet check catches Cyrillic substitutions, and code-point inspection catches payloads carried in variation selectors.

The harder question for AI alignment is whether a model could invent an encoding that passes human review and beats automated scanners it hasn't seen before.