LLMinate - LLM CAPTCHA System

Essays

01 Apr, 2026

Detecting LLMs with Word Substrings: A Novel Approach to CAPTCHA

How a simple linguistic insight can help distinguish humans from AI

The Insight

Here's an observation: when asked to generate random nonsense words, humans and AI behave completely differently.

Humans mash keyboards: ksdjsdksdk, eokdfweinfhfejla, owiewewlkdfk

Large Language Models generate plausible-sounding nonsense: Vesperthrum, Nebulithic, Zorathium

Why? Because LLMs are trained on human language. Even when generating "random" words, they unconsciously follow patterns they've learned combining real morphemes, respecting English phonotactics, and creating words that feel like they could exist.

The Algorithm

This insight leads to a simple detection algorithm:

Ask the user to generate 3 nonsense words
For each word, extract all possible substrings
Check if any substring appears in a dictionary of real English words
If multiple words contain real-word substrings → likely an LLM
If all words contain no real substrings → likely a human

Proof of concept code available here

Why This Works

The Human Advantage

Humans produce true randomness. When asked to make up words, we type patterns like:

Keyboard rows: asdfghjk
Random characters: kjsdhfkjsdhf
Repeated patterns: lolololol

These rarely contain real 5+ character English words by chance.

The LLM Weakness

LLMs generate structured randomness. Even when trying to be creative, they:

Combine real morphemes: Vesper (evening) + thrum (humming)
Follow statistical patterns learned from billions of texts
Create words with realistic character transitions

These inevitably contain real word fragments that dictionary lookups can detect.

Proof of concept code