Anthropic shipped /ultraplan and /ultrareview for Claude Code last month. I spent a week testing them against skills I already built. They’re good skills with a verification loop I hadn’t implemented yet: running on cloud servers instead of your machine, hiding their prompts instead of showing them, and billing extra after three free runs.
What the Ultra Features Do
/ultraplan generates a multi-phase implementation plan with verification checkpoints. /ultrareview runs a code review pipeline with a merge-readiness assessment at the end. Both run remotely: your code gets sent to Anthropic’s cloud, processed through a pipeline you can’t inspect, and the results come back. After three free uses on Pro or Max, each run bills as extra usage. I’ve seen reports of $5-$20 per run depending on codebase size.
You cannot read the prompts driving these features. You cannot see the pipeline stages or the skill markdown that defines the review criteria. Every previous Claude Code skill, /review, /plan, /simplify, shipped as readable markdown in the install directory. You could open them, learn from them, modify them, build on them.
A/B Testing Against My Own Pipeline
I ran /ultrareview against my own pr-review pipeline on the same set of PRs. Blind comparison, same codebase, same changes.
Core review quality is comparable. Both catch the same class of issues: security findings, logic errors, style violations. My pipeline dispatches three parallel reviewers (security, business logic, code quality) and merges their findings. The ultra version produces similar coverage.
Where /ultrareview adds something is at the end. It runs a verification step that checks whether the review findings are valid against the current code state, then produces a merge-readiness verdict. My pipeline didn’t have that. The step catches the false positives that plague AI code review. I described this problem in my piece on adaptive thinking variance, where false positive CRITICALs send developers down rabbit holes. Re-checking findings against actual code before presenting them is a genuine improvement.
I built the same thing. A verification agent that takes review output, re-reads the relevant code sections, filters out findings that don’t hold up, and scores merge-readiness by severity of what remains. Not novel engineering, just the kind of step anyone running AI code review eventually adds after watching enough false positives slip through.
The Tension with Open Skills
Anthropic’s own engineers have been publicly advocating for skills as the right abstraction. Barry Zhang and Mahesh Murag, both on the Claude Code team, have said variations of “Stop building agents. Build Skills.” The message: skills are open, inspectable, composable units of methodology. Build them. Share them.
My entire toolkit is built on that idea. Skills as methodology, agents as domain knowledge, the handyman pulling the right tool from the toolbox. It works because you can read the skill, understand what it does, adapt it to your context.
/ultraplan and /ultrareview run the same kind of pipeline I run locally: phases that produce artifacts, verification steps, parallel agents. Nothing about cloud execution enables capabilities that local execution can’t provide. I run 10 parallel review agents on my laptop during full codebase reviews. Worktrees handle isolation. Headless sessions handle parallelism.
The value of open skills was never just the output. I learned more about code review methodology by reading Claude Code’s /review skill than from most blog posts about code review. The prompts showed what the tool prioritized, what it checked first, how it structured findings. That transparency made me a better skill author. /ultrareview produces review output but doesn’t teach you anything about how to build a review pipeline.
The Commercial Logic
This follows a common pattern. You build an open ecosystem. People invest in it: build skills, share patterns, develop expertise. The ecosystem becomes valuable. Then you ship premium features that use the same architecture but close the implementation.
It’s not limited to review and planning. Anthropic also launched Claude Security, a cloud-hosted vulnerability scanner that traces data flows, validates findings through adversarial verification, and suggests patches. The same pattern: runs remotely, no visibility into the underlying prompts or pipeline, currently in beta for Enterprise with Team and Max access coming. A security review pipeline is something you can build with the existing skill architecture. Scan, verify, patch: three agents in a pipeline with phase gates between them.
Anthropic didn’t promise eternal openness, and these features aren’t worthless. But shipping them as opaque cloud services when they could have been open local skills, the way /review and /plan shipped, is a deliberate choice. The message shifted from “here’s how we build skills, now you build them too” to include “here are skills we built that you can’t see, and they cost extra.”
What This Means for the Skill Ecosystem
The skill ecosystem’s strength is composability. I can take a review skill, pair it with a Go agent, wrap it in a pipeline that saves artifacts at phase boundaries. I can inspect every piece. When something fails, I can diagnose it because I can read the prompts. You can’t compose what you can’t read, and you can’t diagnose failures in a stage you can’t inspect.
If Anthropic ships more features this way, the ecosystem splits into open skills you can build on and closed skills you can pay for. The closed ones will probably be better out of the box because Anthropic has more resources to refine them. The open ones will be more adaptable because you can modify them. That split favors users who can build their own skills. For everyone else, the premium tier becomes the default because the alternative requires expertise that the closed skills no longer help you develop.
I recreated the verification step and it lives in my toolkit where I can see it, modify it, and compose it with everything else. But I have months of accumulated skill-building experience. The shift from open to opaque makes it harder for new people to develop that experience by studying how the built-in skills work.
These are prompt pipelines producing artifacts through phased methodology. That’s what skills are. The question is whether Anthropic ships new capabilities as open skills people can learn from, or as closed services people can subscribe to. The last month suggests a direction.