AI Slop Detector ๐: Catch AI-generated text with a 270M model that runs in your browser
Slop happens. Now you can spot it - without sending your text anywhere.
We fine-tuned a tiny language model to detect AI-generated text. Because it's small (270M parameters, quantized to 242 MB), you can run it entirely in your browser - no API keys, no cloud dependencies, no data leakage. Your text never leaves your machine.
Results
| Model | Parameters | Test Accuracy | Precision | Model Link |
|---|---|---|---|---|
| GPT OSS 120B (teacher) | ~120B | 100% | 100% | |
| Gemma 3 270M (tuned) | 270M | 100% | 100% | HuggingFace |
| Gemma 3 270M Q4 (quantized) | 270M | ~95% | ~95% | |
| Gemma 3 270M (base) | 270M | ~40% | ~55% |
The tuned model matches the 120B teacher while being over 400x smaller - and the quantized version runs entirely in a browser extension at ~95% accuracy.
Quick Start
1. Download the extension
Download this repo as ZIP (or click "Code" -> "Download ZIP" on GitHub)
Unzip the downloaded file. You should have a folder called distil-ai-slop-detector-main.
2. Load in Chrome
- Open
chrome://extensions/ - Enable Developer mode (toggle in top-right)
- Click Load unpacked
- Select the
distil-ai-slop-detector-mainfolder you just unzipped
3. Use it
- Click the ๐ icon in your toolbar
- Wait for model download (~253 MB, cached after first use)
- Paste text and click Analyze
That's it. The model runs locally via Wllama on CPU. No data ever leaves your machine.
Left: AI-generated text ยท Right: Human-written text
How We Trained AI Slop Detector
The Problem
AI-generated "slop" is flooding the internet. You've seen it: the generic LinkedIn posts, the suspiciously polished product reviews, the Reddit comments that hit every corporate communication checkbox.
Existing detectors either require API calls (privacy concern) or run models too large for local deployment. We needed something that:
- Runs locally: no API calls, works offline, keeps your data private
- Fits in a browser: small enough to load in a Chrome extension
- Stays accurate: matches large model performance on the detection task
The dataset
There's no strict definition of "AI slop" because lots of AI-generated text is genuinely useful and humans have been generating organic slop for centuries. The definition is mostly vibes-based, however, it's not completely meaningless either - there's a "you know it when you see it" quality to slop. To sidestep such tricky questions, we simply found an existing dataset to see if we can train a model to at least detect one flavor of slop. We used this Kaggle dataset to train and evaluate the model below.
Validating the Base Model Fails
We tested Gemma 3 270M out of the box on our test set.
The base model achieved ~40% accuracy - essentially random guessing. Common failure modes:
- Classifying formal human writing as AI-generated
- Missing obvious AI markers ("delve into", "it's worth noting")
- Inconsistent outputs across similar inputs
This confirmed the task is learnable but not already learned. A perfect candidate for fine-tuning.
Establishing a Teacher Baseline
We tested GPT OSS 120B with a system prompt explaining the task and few-shot examples.
The 120B model achieved 100% accuracy on our test set. This became our target - could we match it with a model 400x smaller?
Defining the Task Format
{
"input": "lmao no way that actually worked ๐ you're a genius thanks so much!!!",
"output": "human_written"
}{
"input": "We recognize the value of your feedback and remain committed to continuous improvement. Your satisfaction is our top priority.",
"output": "ai_generated"
}The model outputs a single classification label: ai_generated or human_written.
Creating the Training Pipeline
-
Seed Data: We wrote ~50 examples covering obvious AI markers, subtle corporate-speak, casual human text, and edge cases (formal human writing, edited AI output)
-
Synthetic Expansion: Using the Distil Labs platform, we expanded to ~10,000 training examples via knowledge distillation from the teacher model
-
Fine-tuning: We trained Gemma 3 270M using LoRA for 4 epochs
-
Quantization: We converted to Q4_K_M format for browser deployment (~242 MB)
Results by Difficulty
| Difficulty | Description | Examples | Accuracy |
|---|---|---|---|
| Easy | Obvious AI markers | 6 | 100% (6/6) |
| Medium | Subtle differences | 8 | 100% (8/8) |
| Hard | Edge cases | 6 | 83% (5/6) |
Overall: 95% (19/20) on the extended test suite.
Real-World Validation
We tested on content the model never saw during training:
| Content Type | Sample Size | Accuracy |
|---|---|---|
| Reddit comments | 100+ | ~92% |
| ChatGPT outputs | 50 | 98% |
| Human tweets | 50 | 94% |
| Formal emails | 30 | 88% |
The model struggles most with formal human writing (emails, business documents) - these often trigger false positives because they share stylistic patterns with AI output.
Quantization Impact
| Format | Size | Accuracy | Use Case |
|---|---|---|---|
| Full precision | ~1.1 GB | 100% | Server deployment |
| Q4_K_M (quantized) | ~242 MB | ~95% | Browser extension |
Size reduction: ~78% with minimal accuracy loss. Inference speed: ~0.5-2 seconds per query on consumer CPUs.
Train Your Own Detector
The workflow we used is generic across classification tasks. You can train your own model by chatting with Claude using the Distil Labs skill.
Install the Distil CLI
curl -fsSL https://cli-assets.distillabs.ai/install.sh | sh distil login # or distil signup if you're new
Install the Claude Skill
Download the skill repo as ZIP (or click "Code" -> "Download ZIP" on GitHub)
Unzip it and add it to your Claude Desktop or Claude Code. For Claude Code:
/plugin marketplace add https://github.com/distil-labs/distil-cli-skill /plugin install distil-cli@distil-cli-skill
Train a Model
With the skill installed, just describe what you need to Claude:
> Hey Claude, I want to train a classifier that detects AI-generated text.
> Here are some examples in my data/ folder. Can you help me train a model?
Claude will guide you through:
- Creating a model:
distil model create my-detector - Preparing data files: job_description.json, config.yaml, train.csv, test.csv
- Running teacher evaluation: validates the task is learnable
- Training: kicks off the distillation process
- Downloading: gets the trained model to your machine
The full workflow takes a few hours (mostly waiting for training). You can see a complete example in our Text2SQL tutorial.
Resources
For custom training assistance, visit distillabs.ai.
FAQ
Q: How does the extension work?
The repository is organized into two main parts: the browser extension code and the training data/configuration.
ai-slop-detector/
โโโ background.js # Service worker for model coordination
โโโ manifest.json # Chrome extension config
โโโ offscreen.html/js # Isolated model execution context
โโโ popup.html/js # Extension UI
โโโ wllama/ # WebAssembly runtime for inference
โโโ data/ # Training assets (not used at runtime)
โ โโโ config.yaml
โ โโโ job_description.json
โ โโโ train.csv
โ โโโ test.csv
โโโ icons/ # Extension icons and demo images
What runs where:
- Browser: JavaScript + Wllama for fully local inference
- Training (
/data/): Used to create the model via Distil Labs - not needed at runtime
Q: Why not just use GPT-4 / Claude for this?
Because your text shouldn't leave your machine. AI Slop Detector runs locally, works offline, and keeps everything private. No API costs, no rate limits, no data leakage.
Q: Why not use the base Gemma model directly?
The base model only achieves ~40% accuracy - essentially random guessing. Fine-tuning is essential to make it work.
Q: Is my text sent anywhere?
No. The model downloads once from HuggingFace (~253 MB), gets cached by your browser, and runs entirely locally. Your text never leaves your machine.
Q: Why does first load take so long?
The browser downloads and caches the model (~253 MB) on first use. After that, launches are near-instant.
Q: Can I use this offline?
Yes - that's the point.
- 100% offline after initial download
- Privacy-first - no data ever leaves your device
- Fast - ~0.5-2 seconds per query on consumer CPUs
- Free - no API costs or rate limits
Q: The model flagged human text as AI-generated
The model achieves ~95% accuracy after quantization, which means ~1 in 20 predictions may be wrong. Formal human writing (business emails, academic text) sometimes triggers false positives because it shares stylistic patterns with AI output.
If you find consistent errors, please open an issue.
Q: Can you train a detector for my specific use case?
Yes! Visit distillabs.ai to discuss custom solutions.
Links
Built with Distil Labs - turn a prompt and a few examples into production-ready small language models.