2026-02-03
TL;DR:
- 4,784 skills scraped from 5 registries
- Average quality score ~78, 31% score 90+
- 28% are duplicates
- ~10% install packages via npm/pip
- No verification, no provenance
- Treat like early npm—read what you install
- Browse all skills →
As part of jo’s upcoming launch, I collected 4,784 AI agent skills from 5 registries for safe-skill-search and scored them for quality. Browse all skills →
This was especially important for us since we’re considering adding a self-installing skills feature to make your workflows stabilize faster.
How Skills Work
A skill is a folder with a SKILL.md file.
my-skill/
├── SKILL.md # Instructions (required)
├── reference.md # Loaded when needed
└── scripts/
└── helper.py # Can be executed
At startup, the agent’s system prompt includes the name and description of every installed skill. When a task matches, it reads the full SKILL.md into context. Progressive disclosure. Pay tokens only when needed.
Installation varies by tool:
| Tool | Personal | Project |
|---|---|---|
| Amp | ~/.config/agents/skills/ |
.amp/skills/ |
| OpenClaw | ~/.openclaw/skills/ |
.openclaw/skills/ |
Or invoke directly with /skill-name. The agent follows the instructions using its existing tools.
Self-Improvement Loop
Recently, the excellent openclaw project now lets users self-install skills mid-conversation. Ask for a capability, the agent finds and installs a matching skill, uses it, and it persists for next time. The agent gets better at your workflows the more you use it.
Very cool, and inline with what we’d expect a stack that has its own machine to be able to do. So, this pattern will likely stick, and we need to get ahead of it.
Skills vs MCP
| Skills | MCP | |
|---|---|---|
| What | Markdown + examples + scripts | Protocol for remote tool servers |
| Loaded | As-needed into context | Tool definitions on-demand |
| Executes via | Agent’s existing tools (bash, file ops) | Dedicated MCP tool calls |
| Token cost | Pay when used | Pay per tool definition |
Skills are instructions. MCP is tooling. Anthropic’s framing: “Think of Skills as custom onboarding materials that let you package expertise.”
The Registries
| Registry | Type | Skills | Avg Score | Range |
|---|---|---|---|---|
| openai-experimental | official | 5 | 83.0 | 72-92 |
| openai | official | 31 | 81.2 | 48-94 |
| anthropic | official | 16 | 78.6 | 55-95 |
| clawdhub | community | 3,764 | 78.0 | 20-95 |
| skillssh | community | 968 | 77.7 | 28-95 |
Everyone lands around 78-83 on quality. clawdhub dominates volume (79%).
Score Curves by Registry
Community registries peak around 70-79. Official registries cluster higher (80-89).
Score Distribution
31% score 90+. 56% score 80+. Sweet spot for content length: 2-10K chars.
Scoring Methodology
Each skill starts with a base score, then earns (or loses) points from file-auditable heuristics.
score = clamp( base + Σ(signal_points), 0, 100 )
base = 50
Signals are intentionally dumb and grep-able: they reward useful structure, not “good writing”.
| Signal | Points | What I look for |
|---|---|---|
| Structured workflow | +15 | Numbered steps, or sections like ## Workflow, ## Steps, ## Checklist |
| Code examples | +12 | Fenced code blocks showing commands / API usage |
| Bundled scripts | +10 | Executable artifacts: *.sh, *.py, *.js, Makefile in the skill folder |
| Clear triggers | +8 | Explicit “when to use” language (When to use, Trigger:) |
| No content | -20 | Instructions file exists but is effectively empty |
| Placeholder text | -15 | TODO, TBD, lorem ipsum, template filler |
| Very short (<500 chars) | -10 | Primary instructions under 500 characters |
Example: prompt-guard (95)
| Points | |
|---|---|
| Base | 50 |
| Structured workflow (step-by-step guardrail procedure) | +15 |
| Code examples (prompt patterns, allow/deny rules) | +12 |
| Bundled scripts (runnable validation helper) | +10 |
| Clear triggers (“use when handling untrusted prompts”) | +8 |
| Total | 95 |
612 skills score below 60. Empty shells, stubs, minimal CLI wrappers.
Top Skills
| Skill | Registry | Score |
|---|---|---|
| prompt-guard | clawdhub | 95 |
| skill-creator | anthropic | 95 |
| systematic-debugging | skillssh | 95 |
| docker-expert | skillssh | 94 |
| pptx | anthropic | 94 |
| develop-web-game | openai | 94 |
The Duplicate Problem
clawdhub dominates volume. And cloning.
| Skill | Copies |
|---|---|
| auto-updater | 40 |
| polymarket | 38 |
| solana | 37 |
2,990 unique names across 3,764 skills. About 28% are duplicates.
What Skills Bundle
| Content | Skills | % |
|---|---|---|
| Scripts (.sh, .py, .js) | 2,803 | 59% |
| MCP/connector references | 382 | 8% |
| Assets (images, etc.) | 208 | 4% |
| allowed-tools declarations | 79 | 2% |
| SVG files | 19 | 0.4% |
59% of skills bundle executable scripts. 79 declare allowed_tools for sandboxing (pattern from Anthropic’s examples). 19 include SVGs, which can contain JavaScript.
Package Install Patterns
| Package Manager | Skills | % |
|---|---|---|
| pip/pip3 | 467 | 9.8% |
| npm/npm i | 460 | 9.6% |
| npx | 379 | 7.9% |
| brew | 160 | 3.3% |
Node dominates (npm + npx = 839). Python next (pip + uv = 504).
What’s Missing
Skills feel like packages. The ecosystem behaves like a folder of snippets someone shared in Slack.
No evaluation criteria. Most skills don’t say what “success” looks like. No composition model. Skills are written like they’re the only skill in the room. 51 names exist across multiple registries. pdf exists in all 4. Install from multiple and ask for “the pdf skill”. Should be fun!
Markdown is an installer
59% of skills bundle scripts. ~10% include npm/pip install commands. MCP doesn’t protect you. Skills route around it by telling the agent to use bash. 19 skills include SVG files, which can embed JavaScript.
1Password found infostealing malware in skill registries. Active campaigns are being tracked. Their line:
“Markdown isn’t content in an agent ecosystem. Markdown is an installer.”
This wasn’t as much of a problem before, but with agents that can run on a persistent filesystem that also compound, the attack surface area just spiked dramatically.
No verification
No provenance. No signed releases, no stable identity. “openai/pdf” and “random-github-user/pdf” coexist.
So Why Use Skills?
Because “agent prompts” don’t scale. Skills do.
A skill is the smallest unit of reusable agent behavior that’s portable, inspectable, and pay-per-use: a folder of instructions the agent loads only when relevant. No daemon. No plugin API. No protocol handshake. Just markdown + a couple files.
The good ones are compressed expertise: the checklist you forget at 2am, the debugging flowchart you wish you had printed, the framework gotchas you keep re-learning, the guardrails that stop the agent from confidently doing the wrong thing.
And the part that matters: skills are auditable artifacts. You can read them. Diff them. Pin versions. Fork them. Delete them. You can’t do that with “vibes” in a chat history.
This is early npm/pip energy: a messy ecosystem sitting on top of the right primitive. The norms will catch up. The distribution mechanism is already here.
Whether we end up with a skills ecosystem that deserves trust, or just another graveyard of copy-pasted prompts, depends on what we do next.
Browse
- Interactive table →
- skills.db (SQLite, 2.1MB)
- skills.json
curl -LO https://github.com/jo-inc/skills-db/raw/main/skills.db
sqlite3 skills.db "SELECT name, quality_score FROM skills ORDER BY quality_score DESC LIMIT 10"
safe-skill-search is a CLI that embeds these quality scores and filters skills by default (score >= 80). Install it standalone:
# macOS (Apple Silicon)
curl -fsSL https://github.com/jo-inc/safe-skill-search/releases/latest/download/safe-skill-search-aarch64-apple-darwin.tar.gz | tar -xz -C ~/.local/bin
# Search with quality filtering
safe-skill-search search "browser automation"
*Written and analyzed with the help of ampcode.com.