Agent Skills: 59% Ship Scripts. 12% Are Empty.

6 min read Original article ↗

2026-02-03

TL;DR:

  • 4,784 skills scraped from 5 registries
  • Average quality score ~78, 31% score 90+
  • 28% are duplicates
  • ~10% install packages via npm/pip
  • No verification, no provenance
  • Treat like early npm—read what you install
  • Browse all skills →

As part of jo’s upcoming launch, I collected 4,784 AI agent skills from 5 registries for safe-skill-search and scored them for quality. Browse all skills →

This was especially important for us since we’re considering adding a self-installing skills feature to make your workflows stabilize faster.

How Skills Work

A skill is a folder with a SKILL.md file.

my-skill/
├── SKILL.md           # Instructions (required)
├── reference.md       # Loaded when needed
└── scripts/
    └── helper.py      # Can be executed

At startup, the agent’s system prompt includes the name and description of every installed skill. When a task matches, it reads the full SKILL.md into context. Progressive disclosure. Pay tokens only when needed.

Installation varies by tool:

Tool Personal Project
Amp ~/.config/agents/skills/ .amp/skills/
OpenClaw ~/.openclaw/skills/ .openclaw/skills/

Or invoke directly with /skill-name. The agent follows the instructions using its existing tools.

Self-Improvement Loop

Recently, the excellent openclaw project now lets users self-install skills mid-conversation. Ask for a capability, the agent finds and installs a matching skill, uses it, and it persists for next time. The agent gets better at your workflows the more you use it.

Very cool, and inline with what we’d expect a stack that has its own machine to be able to do. So, this pattern will likely stick, and we need to get ahead of it.

Skills vs MCP

  Skills MCP
What Markdown + examples + scripts Protocol for remote tool servers
Loaded As-needed into context Tool definitions on-demand
Executes via Agent’s existing tools (bash, file ops) Dedicated MCP tool calls
Token cost Pay when used Pay per tool definition

Skills are instructions. MCP is tooling. Anthropic’s framing: “Think of Skills as custom onboarding materials that let you package expertise.”

The Registries

Registry Type Skills Avg Score Range
openai-experimental official 5 83.0 72-92
openai official 31 81.2 48-94
anthropic official 16 78.6 55-95
clawdhub community 3,764 78.0 20-95
skillssh community 968 77.7 28-95

Everyone lands around 78-83 on quality. clawdhub dominates volume (79%).

Score Curves by Registry

Community registries peak around 70-79. Official registries cluster higher (80-89).

Score Distribution

31% score 90+. 56% score 80+. Sweet spot for content length: 2-10K chars.

Scoring Methodology

Each skill starts with a base score, then earns (or loses) points from file-auditable heuristics.

score = clamp( base + Σ(signal_points), 0, 100 )
base = 50

Signals are intentionally dumb and grep-able: they reward useful structure, not “good writing”.

Signal Points What I look for
Structured workflow +15 Numbered steps, or sections like ## Workflow, ## Steps, ## Checklist
Code examples +12 Fenced code blocks showing commands / API usage
Bundled scripts +10 Executable artifacts: *.sh, *.py, *.js, Makefile in the skill folder
Clear triggers +8 Explicit “when to use” language (When to use, Trigger:)
No content -20 Instructions file exists but is effectively empty
Placeholder text -15 TODO, TBD, lorem ipsum, template filler
Very short (<500 chars) -10 Primary instructions under 500 characters

Example: prompt-guard (95)

  Points
Base 50
Structured workflow (step-by-step guardrail procedure) +15
Code examples (prompt patterns, allow/deny rules) +12
Bundled scripts (runnable validation helper) +10
Clear triggers (“use when handling untrusted prompts”) +8
Total 95

612 skills score below 60. Empty shells, stubs, minimal CLI wrappers.

Top Skills

Skill Registry Score
prompt-guard clawdhub 95
skill-creator anthropic 95
systematic-debugging skillssh 95
docker-expert skillssh 94
pptx anthropic 94
develop-web-game openai 94

The Duplicate Problem

clawdhub dominates volume. And cloning.

Skill Copies
auto-updater 40
polymarket 38
solana 37

2,990 unique names across 3,764 skills. About 28% are duplicates.

What Skills Bundle

Content Skills %
Scripts (.sh, .py, .js) 2,803 59%
MCP/connector references 382 8%
Assets (images, etc.) 208 4%
allowed-tools declarations 79 2%
SVG files 19 0.4%

59% of skills bundle executable scripts. 79 declare allowed_tools for sandboxing (pattern from Anthropic’s examples). 19 include SVGs, which can contain JavaScript.

Package Install Patterns

Package Manager Skills %
pip/pip3 467 9.8%
npm/npm i 460 9.6%
npx 379 7.9%
brew 160 3.3%

Node dominates (npm + npx = 839). Python next (pip + uv = 504).

What’s Missing

Skills feel like packages. The ecosystem behaves like a folder of snippets someone shared in Slack.

No evaluation criteria. Most skills don’t say what “success” looks like. No composition model. Skills are written like they’re the only skill in the room. 51 names exist across multiple registries. pdf exists in all 4. Install from multiple and ask for “the pdf skill”. Should be fun!

Markdown is an installer

59% of skills bundle scripts. ~10% include npm/pip install commands. MCP doesn’t protect you. Skills route around it by telling the agent to use bash. 19 skills include SVG files, which can embed JavaScript.

1Password found infostealing malware in skill registries. Active campaigns are being tracked. Their line:

“Markdown isn’t content in an agent ecosystem. Markdown is an installer.”

This wasn’t as much of a problem before, but with agents that can run on a persistent filesystem that also compound, the attack surface area just spiked dramatically.

No verification

No provenance. No signed releases, no stable identity. “openai/pdf” and “random-github-user/pdf” coexist.

So Why Use Skills?

Because “agent prompts” don’t scale. Skills do.

A skill is the smallest unit of reusable agent behavior that’s portable, inspectable, and pay-per-use: a folder of instructions the agent loads only when relevant. No daemon. No plugin API. No protocol handshake. Just markdown + a couple files.

The good ones are compressed expertise: the checklist you forget at 2am, the debugging flowchart you wish you had printed, the framework gotchas you keep re-learning, the guardrails that stop the agent from confidently doing the wrong thing.

And the part that matters: skills are auditable artifacts. You can read them. Diff them. Pin versions. Fork them. Delete them. You can’t do that with “vibes” in a chat history.

This is early npm/pip energy: a messy ecosystem sitting on top of the right primitive. The norms will catch up. The distribution mechanism is already here.

Whether we end up with a skills ecosystem that deserves trust, or just another graveyard of copy-pasted prompts, depends on what we do next.

Browse

curl -LO https://github.com/jo-inc/skills-db/raw/main/skills.db
sqlite3 skills.db "SELECT name, quality_score FROM skills ORDER BY quality_score DESC LIMIT 10"

safe-skill-search is a CLI that embeds these quality scores and filters skills by default (score >= 80). Install it standalone:

# macOS (Apple Silicon)
curl -fsSL https://github.com/jo-inc/safe-skill-search/releases/latest/download/safe-skill-search-aarch64-apple-darwin.tar.gz | tar -xz -C ~/.local/bin

# Search with quality filtering
safe-skill-search search "browser automation"

*Written and analyzed with the help of ampcode.com.