For the past two years, every generative AI project I embarked on started the same way: the very first step in architectural planning was deciding exactly how many agents we needed and what each one would specialize in. Each one is optimized, siloed, and purpose-built for its domain. The reasoning was intuitive: specialized agents outperform generalist ones, so a model prompted for a narrow task should, in theory, outperform a generalist handling the same job.
Press enter or click to view image in full size
Three Components. Every Agent. Every Time.
The one observation that consistently emerged after building and debugging these systems across different domains was that, regardless of the agent’s supposed ‘expertise,’ the underlying architecture was always the same. Every agent is reduced to just three core components:
- System Prompt: The static core behavior and constraints.
- Tools: The capabilities of the agent (enabled through function calling)
- Context: The engineering of exactly what information is passed to the agent, and how it is injected into the context window. This includes both the short-term state and the long-term state
This observation is also backed by Barry Zhang of Anthropic, who mentioned: “We used to think agents in different domains will look very different… The agent underneath is actually more universal than we thought.” [1]
What this reveals is not that specialization was unnecessary, but that it had been implemented at the wrong level. That distinction opened a different path: rather than building a new agent for each domain, you could build one runtime and change what it carries.
Why We Scattered Before We Could Centralize
Going back to 2023, I was obsessed with RAG. I spent months fighting with chunking strategies and embedding drift just to get a model to stop hallucinating about a PDF. The last two years were the distributed era. For complicated workflows, especially spanning different domains, the idea was to simplify them into smaller pieces and assign dedicated agents to each one. Architectures proliferated: swarms, hierarchical supervisors, peer networks. This allowed individual teams to work on agents relatively independently, often without coordination across the organization. This produced what some observers started calling “agent sprawl”: a fragmented landscape of autonomous scripts operating with inconsistent oversight, difficult to audit, and even harder to maintain [2].
One thing the distributed era actually got right, even if the architecture around it was a mess: it forced people to think seriously about context. The limitations of prompt engineering on monolithic systems gave way to a more sophisticated discipline: context engineering. Rather than simply writing better instructions for a single model, context engineering involves architecting the entire information environment surrounding a model, such as what it knows, when it knows it, and how that knowledge is structured. In multi-agent systems, where context loss between handoffs was a persistent failure mode, this is still a central design challenge.
The current phase is the move back toward a unified center, but not a return to the blunt monolith of 2023. The difference is a new kind of infrastructure that makes centralization genuinely viable.
MCP and Agent Skills: The Missing Half of the Equation
The distributed era’s instinct toward specialization was solving a real problem with the tools available at the time. Two developments have since offered a different path.
The first was the Model Context Protocol (MCP), released by Anthropic in late 2024. MCP standardized how AI models connect to external tools and data sources, providing a universal interface that any compliant tool can plug into [3]. MCP addressed the tools half of the problem identified above.
The second breakthrough was Agent Skills, a concept introduced by Anthropic in 2025. Agent Skills addressed the system prompt side of the equation. The idea is that rather than building a separate agent for each domain, you build a single general-purpose agent runtime and equip it with modular “skill” libraries that can be loaded on demand. A medical trial analysis task loads one set of instructions. A contract review task loads another. The underlying runtime does not change [4].
Together, these breakthroughs shifted the engineering focus to the third and most difficult pillar: Context. They opened a practical path toward the Cognitive Core, a single high-reasoning orchestrator that dynamically assembles the environment it needs for any given task, rather than deferring to a pre-specialized agent.
The Architecture in Practice
In practice, the Cognitive Core is a single high-reasoning model that sits at the center of a workflow, responsible for intent, planning, and verification. It does not need domain expertise encoded into it ahead of time. What it needs is the ability to identify what a given sub-task requires, pull in the right capability, complete the step, and move on. The tools, API connections, and data pipelines around it are modular and stateless; they do not need awareness of the broader task, only the input they are given at that moment.
The governed catalog of capabilities the orchestrator draws from is called the Skill Registry: a centralized library of approved MCP servers and skill prompts that can be loaded into the active context window on demand and cleared once the step is done.
This is where the third pillar comes alive through Just-in-Time (JIT) Context Hydration. Rather than loading every possible tool description and instruction set into the initial context, a practice that reliably degrades model performance through what researchers call “context rot”, the system injects only the documentation and instructions relevant to the current sub-task. Once that sub-task is finished, the context is cleared. The result is a model that stays lean and focused throughout a complex, multi-step workflow.
To see this discipline in action, consider a Technical Support Orchestrator resolving a server error. A Context Engineer faces a specific decision path:
- Strategy A: Injecting the last ten messages of chat history to preserve the “human” nuances of the conversation.
- Strategy B: Pruning that history into a three-sentence summary of intent, then using the freed context “room” to hydrate the live server logs and the troubleshooting schema for that hardware.
In my experience building these systems, Strategy B is almost always superior. By prioritizing high-signal technical data over conversational noise, we maximize the tokens spent on reasoning, rather than unnecessary context.
When Distributed Still Beats Centralized
None of this means multi-agent architectures are obsolete. A large-scale study published by Google Research in January 2026, evaluating 180 agent configurations across five different architectures, put some precise numbers to what practitioners had been observing anecdotally [5]. On tasks with strict sequential dependencies, like multi-step planning, every multi-agent architecture tested degraded performance by between 39 and 70 percent compared to a single agent. When each step of a task depends on the output of the previous one, the overhead of coordinating between agents fragments the reasoning process without adding any real benefit.
The same study found a reliability gap that is harder to ignore. Independent multi-agent systems, where agents work in parallel without communicating, amplified errors by 17.2x. A mistake by one agent propagated through the system with almost nothing to catch it. Centralized systems with an orchestrator contain that amplification to 4.4x, because the orchestrator effectively acts as a validation layer before errors can cascade downstream [5].
But the picture flips entirely for the right kind of task. On parallelizable work like financial analysis, where distinct agents can simultaneously examine revenue trends, cost structures, and market comparisons without depending on each other’s outputs, centralized multi-agent coordination improved performance by over 80 percent compared to a single agent [5].
Anthropic’s engineering team, writing about their multi-agent research system, drew a similar boundary. Agents typically consume around four times more tokens than standard chat interactions, and multi-agent systems consume around fifteen times more, making economic justification essential. But the tasks where multi-agent systems genuinely excel are identifiable: heavy parallelization, information that exceeds a single context window, and workflows requiring simultaneous coordination across many complex tools [6].
The Cognitive Core and the multi-agent swarm are not competing answers to the same question. They are answers to different questions. When a set of tasks is largely sequential, highly interdependent, and requires a coherent thread of reasoning held together throughout, the Cognitive Core model is a better fit. When tasks are genuinely parallel, mutually independent, and too large for any single context window, a distributed architecture remains the stronger choice.
The industry’s current moment is about recognizing that distinction clearly and stopping the habit of reaching for agent sprawl by default when it is not actually warranted.
The Unglamorous Work That Makes This Actually Function
Transitioning to a Cognitive Core isn’t just a code change; it’s a shift in ownership. Someone has to actually maintain the Skill Registry, version the MCP servers, and arbitrate when two teams try to push conflicting ‘Data Analysis’ prompts into the same Core.
This shift has meaningful governance implications. Monitoring and auditing fifty autonomous agents operating independently is, in practice, very difficult. A single orchestrator with a centralized registry of approved capabilities is a much more tractable security surface. Organizations building on this model are finding that AI governance shifts from trying to monitor agent behavior after the fact to managing a registry of trusted, pre-approved capabilities up front.
An ungoverned skill registry may become a victim of the following failure modes:
- Without a central registry, different teams create overlapping skills (e.g., two different “SQL Expert” skills with conflicting schemas), leading to inconsistent outputs or a “skill drift”.
- A skill becomes a dumping ground for too many tool definitions, triggering context rot before the Core even begins the task, also known as “context bloat”.
Ultimately, a well-governed skill registry provides massive procurement leverage, and the organization owns the skill definition. The underlying LLM or tool provider becomes a pluggable commodity. If a newer model is released that is 30% faster, it can be swapped into the Core without rewriting the business logic stored in the registry.
The New Job: Context Orchestrator
The transition away from agent sprawl isn’t a silver bullet; it’s a trade-off. We are trading the chaotic modularity of dozens of autonomous scripts for the rigorous discipline of maintaining a central Skill Registry. This is a massive shift in how we think about “ownership” in an AI stack.
If there is one lesson I’ve taken from the transition to the Cognitive Core, it’s that the most valuable person in the room is no longer the one who can write the cleverest prompt. It’s the one who can enforce a strict “context budget”. The LLM underneath is increasingly a commodity. The long-term value will be the quality of your Registry: how well you’ve versioned your prompts, how cleanly your MCP servers are mapped, and how ruthlessly you prune the noise before it hits the model. We are moving out of the era of “AI magic” and into an era of professionalized context orchestration.