Detect suspicious AI-text fingerprints in user submissions β fast, offline, no ML required.
Smellcheck scans text for patterns that frequently appear in AI-generated writing: unusual punctuation characters, overused AI buzzwords, and vocabulary that people recognize but almost never type themselves.
Important caveat: smellcheck can tell you that a text looks suspicious β it cannot reliably tell you that a text was written by AI. A flagged text might have been written by a human who just loves em dashes. A clean text could still be AI-generated. Use the results as a signal to guide human review, not as a verdict.
Smellcheck is in an early alpha stage, use it with caution. Currently, it only works for English texts.
π Try the live demo
How it works
Smellcheck uses static analysis only β no machine learning, no API calls, no latency, no cost. It checks for:
- Typography characters that AI models produce naturally but humans rarely type (em dashes, curly quotes, ellipsis
β¦) - Unicode symbols and emoji clusters common in LLM output
- AI clichΓ© phrases (delve into, it's worth noting, tapestry of)
- Formal or legalistic vocabulary humans recognize but almost never reach for (aforementioned, heretofore, whilst)
Install
The package is not yet published to npm. Install directly from GitHub using npm/NodeJS:
# npm
npm install github:fbuchinger/smellcheckCLI
Example
> echo "β¦and there are many β of this paradigm shift π." | smellcheck "β¦and there are many β of this paradigm shift π." ββ Smellcheck Report ββββββββββββββββββββββββββββββββββ β AI fingerprints detected TYPO 2 match(es) β "β¦" at position 1 Horizontal ellipsis (β¦) β distinct from three dots β "β" at position 21 Em dash (β) β rarely typed manually UNICODE 1 match(es) β "π" at position 46 Suspicious Unicode character (Miscellaneous symbols and pictographs): U+1F31F BUZZ 1 match(es) β "paradigm" at position 31 AI buzzword/clichΓ©: "paradigm" ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Note: In Windows, make sure to switch the cmd.exe codepage to UTF-8 by executing the command chcp 65001, otherwise the unicode detection will not work.
In Linux / Unix, replace echo with cat.
Basic usage
# Analyze a plain text file smellcheck report.txt # Pipe from stdin cat submission.txt | smellcheck # Output raw JSON (for piping to other tools) smellcheck --json report.txt # Disable specific plugins smellcheck --no-unicode --no-buzzwords report.txt # Exit code: 0 = clean, 1 = flagged β useful in CI / git hooks smellcheck report.txt && echo "Clean!"
Analyzing PDFs
Smellcheck reads plain text. Use a third-party tool to extract text first, then pipe it in:
# Using pdftotext (part of poppler-utils, available on Linux/macOS/WSL) pdftotext submission.pdf - | smellcheck # Using pdftotext with a specific page range pdftotext -f 1 -l 3 submission.pdf - | smellcheck # Using pdf-to-text (Node.js, cross-platform) npx pdf-to-text submission.pdf | smellcheck # Save extracted text first, then analyze pdftotext submission.pdf submission.txt && smellcheck submission.txt
Analyzing web pages
# Using curl + html2text to strip markup curl -s https://example.com/article | html2text | smellcheck # Using lynx lynx -dump https://example.com/article | smellcheck
In CI / git hooks
# Fail a pull request if a generated file looks AI-written smellcheck docs/release-notes.md || { echo "AI smell detected β please review"; exit 1; }
Plugins
All plugins are enabled by default and can be toggled individually.
| Plugin | What it detects | Why it matters |
|---|---|---|
typography |
Em dashes β, en dashes β, non-breaking spaces, zero-width chars, curly quotes ", soft hyphens, ellipsis β¦ |
These characters are standard output for LLMs because training data is full of typeset documents β but on a keyboard they require special key combos most people never bother with. A 2023 analysis of GPT-4 output found em dashes present in ~73% of long-form samples vs. ~12% of human-written equivalents. |
unicode |
Emoji, pictograms, decorative symbols from Unicode blocks rarely found in plain text | LLMs frequently insert decorative Unicode when producing structured or list-heavy content, a pattern identified in Guo et al., 2023 β "How Close is ChatGPT to Human Experts?". |
buzzwords |
AI clichΓ©s: delve, tapestry, nuanced, holistic, robust, leverage, cutting-edge, it's worth noting β¦ | These phrases are statistically overrepresented in LLM output compared to human writing. The word delve, for instance, appears roughly 7Γ more often in ChatGPT responses than in human-written text of similar length. |
unnatural |
Vocabulary humans recognize but rarely type spontaneously: aforementioned, heretofore, whilst, elucidate, notwithstanding β¦ | LLMs are trained on formal written corpora (legal documents, academic papers, Wikipedia) and tend to reproduce formal register even in casual contexts. Human writers almost never spontaneously choose aforementioned over "the above" or whilst over "while" β making these words strong soft signals. See Kobak et al., 2025 β Delving into LLM-assisted writing in biomedical publications through excess vocabulary for background on vocabulary distribution as a detection signal. |
Custom plugins
You can add your own analysis logic:
import { Smellcheck } from 'smellcheck'; import type { SlobPlugin, PluginResult } from 'smellcheck'; class MyPlugin implements SlobPlugin { name = 'my-plugin'; analyze(text: string): PluginResult { const matches = []; // ... your logic return { plugin: this.name, flagged: matches.length > 0, matches }; } } const checker = new Smellcheck(); checker.use(new MyPlugin());
Configuration
smellcheck.config.json
Place this file in your project root and createSmellcheck() will pick it up automatically:
{
"plugins": {
"typography": true,
"unicode": true,
"buzzwords": {
"extra": ["synergize", "circle back"],
"exclude": ["robust"]
},
"unnatural": {
"extra": ["heretofore"],
"exclude": ["whilst"]
}
}
}Library usage
Async (loads config file)
import { createSmellcheck } from 'smellcheck'; const checker = await createSmellcheck({ plugins: { unicode: false, // disable a plugin buzzwords: { extra: ['synergize'] }, // extend word lists unnatural: { exclude: ['whilst'] }, // remove false positives } }); const result = checker.analyze(text); console.log(result.flagged); // true | false console.log(result.plugins); // per-plugin breakdown console.log(result.allMatches); // all matches sorted by position
Synchronous (no config file)
import { Smellcheck } from 'smellcheck'; const checker = new Smellcheck({ plugins: { unicode: false } }); const result = checker.analyze(text);
With HTML rendering
import { createSmellcheck, renderHtml, renderLegendHtml, renderSummaryHtml } from 'smellcheck'; const checker = await createSmellcheck(); const result = checker.analyze(userSubmission); document.getElementById('preview').innerHTML = renderHtml(userSubmission, result); document.getElementById('legend').innerHTML = renderLegendHtml(); document.getElementById('summary').innerHTML = renderSummaryHtml(result);
Live textarea integration
import { Smellcheck, watchTextarea, renderHtml } from 'smellcheck'; const checker = new Smellcheck(); const textarea = document.getElementById('submission') as HTMLTextAreaElement; const preview = document.getElementById('preview'); // Analyzes on every keystroke (debounced 300 ms by default) const cleanup = watchTextarea(textarea, (text) => { const result = checker.analyze(text); preview.innerHTML = renderHtml(text, result); }); // Call cleanup() to remove the event listener when done
Clipboard (browser)
import { readFromClipboard, Smellcheck } from 'smellcheck'; const text = await readFromClipboard(); const result = new Smellcheck().analyze(text);
Result shape
interface SmellcheckResult { flagged: boolean; // true if ANY plugin flagged plugins: PluginResult[]; allMatches: Match[]; // sorted by position } interface PluginResult { plugin: string; flagged: boolean; matches: Match[]; } interface Match { text: string; // matched text index: number; // position in original string length: number; plugin: string; reason: string; // human-readable explanation }
License
MIT GenAI tools assisted in the creation of smellcheck