MCP, we barely knew thee...

Last night, someone threw a “funeral” for MCP on X. The haters are writing eulogies about token spend, while teams from Uber and Duolingo, are on the Dev Summit roster in NYC today discussing internal MCP work. The critics are wrong about why MCP is struggling, but they’re right that something is missing.

Perplexity Says MCP Sucks.

Denis Yarats, Perplexity’s CTO, announced in March that they’re moving away from MCP internally. The numbers are damning: three MCP servers consumed 143,000 of 200,000 available context tokens before the agent could do any real work. Seventy-two percent of the context window gone before the model has seen a single user query.

In Part 1, I argued that the missing thing is trust. Not in the identity sense (MCP’s own auth docs use the word that way), but in verifiable data-handling guarantees: where does this data go, who stores it, for how long, under which jurisdiction’s laws?

MCP reported over 97 million monthly SDK downloads in December, and still no standard sensitivity vocabulary or enforcement semantics for saying what a payload contains and where it can go.

Here’s what I built to fill that gap. Working code, running in production for six months. It’s messy, produces false positives, and has blocked me from editing my own essays more than once. I’m glad I have it.

Every prompt gets scanned before it reaches any inference provider. The classifier detects eight content categories: PII, financial data, health information, relationship details, credentials, strategic business data, legal content, and information about children. Each category maps to a sensitivity level, which requires a minimum provider trust tier.

When I started routing across multiple LLM providers six months ago, the classifier was got built early, because the first time I sent a prompt containing my Oura Ring health data to a model, I had an involuntary physical reaction to the idea that it might end up in a training set. The classifier started as gut instinct and became infrastructure. Six months later, it’s the component I’d rebuild first if I had to start over: it’s the foundation.

An essay about medical data governance is not a medical record.

A prompt asking about salary benchmarks is not a salary.

Obvious to us, inscrutable to regex.

Here’s the pattern matching:

# All patterns compiled with re.IGNORECASE
CATEGORY_PATTERNS = {
    "pii":           r"\b\d{3}-\d{2}-\d{4}\b",  # SSN format (production also checks email, phone, DOB)
    "health":        r"\b(surgery|MRI|diagnosis|prescription|oura|HRV)\b",
    "financial":     r"\b(bank account|runway|burn rate|salary|compensation)\b",
    "relationships": r"\b(partner|girlfriend|divorce|custody|intimate)\b",
    "credentials":   r"\b(api[_-]?key|secret[_-]?key|sk-[a-zA-Z0-9]+)\b",
    "strategic":     r"\b(competitor|acquisition|roadmap|fundraising)\b",
    "legal":         r"\b(attorney|lawsuit|NDA|settlement|litigation)\b",
    "parenting":     r"\b(children|school|custody|co-?parent)\b",
}

The base classifier emits PUBLIC, CONFIDENTIAL, or SECRET. Credentials trigger SECRET. Health, financial, PII, relationships, strategic, legal, parenting trigger CONFIDENTIAL. In production, INTERNAL is also emitted by positive detectors for proprietary but non-regulated content (code, internal domains, repo paths, ticket IDs) which I’ve omitted here for brevity. Secondary heuristics downgrade abstract editorial or research content from CONFIDENTIAL to INTERNAL.

Then the routing decision:

SENSITIVITY_TIER_REQUIRED = {
    "public": "untrusted",      # Can go anywhere
    "internal": "standard",     # Paid APIs with acceptable ToS
    "confidential": "trusted",  # Strong data-handling guarantees only
    "secret": "sovereign",      # On-device or self-hosted only
}

# Config-specific handling postures in MY stack: retention, training
# policy, jurisdiction, contract terms, feature eligibility.
PROVIDER_TIERS = {
    "ollama_local":              "sovereign",  # Nothing leaves the machine.
    "anthropic_zdr":             "trusted",    # Zero data retention. HIPAA BAA.
    "vertex_standard":           "trusted",    # Enterprise ToS. No training.
    "openai_default":            "standard",   # 30-day abuse-monitoring retention.
    "openai_zdr":                "trusted",    # Zero retention by arrangement.
    "fireworks_open_models":     "standard",   # ZDR by default for open models.
    "deepseek_default":          "untrusted",  # PRC jurisdiction. Training opt-out unclear.
    "grok":                      "blocked",    # My policy. Your mileage may vary.
}

These tiers are scoped to my configuration. Anthropic’s zero-retention and HIPAA coverage are feature-specific, not blanket. OpenAI offers ZDR by request and supports BAAs. The real trust unit is provider x org x feature x region x contract. That granularity strengthens the case for protocol-level metadata: every deployment needs its own trust matrix, which means the protocol needs to carry metadata that makes routing decisions possible.

When the classifier detects a mismatch, it returns a structured tool-result error. Not a transport-level failure: the agent runtime sees the rejection, reasons over the structured details, and can self-correct — hot-swapping to a higher-tier provider and retrying, or halting and escalating. The error payload lives in _meta so it’s accessible to runtimes and gateways without breaking strict MCP CallToolResult validation:

{
    "content": [{"type": "text",
        "text": "Trust tier mismatch: requires 'trusted' or higher"}],
    "_meta": {
        "error_details": {
            "error": "trust_tier_mismatch",
            "detected_sensitivity": "confidential",
            "blocked_categories": ["health"],
            "bound_tier": "standard",
            "required_tier": "trusted"
        }
    },
    "isError": true
}

This is the exact class of error I got when trying to send my own essay to GPT for editing. My orchestrator routes editorial tasks to GPT as a tool call. The word “surgery” triggered the health category. GPT is STANDARD tier. Block. The orchestrator saw the structured rejection, but I hadn’t built the retry-with-upgrade path yet, so it just surfaced the error.

The system did exactly what it was designed to do. It was just wrong about what “surgery” meant in that context. I fixed it with escape hatches: a research-query detector that downgrades short lookups about public entities, and an editorial-content detector that downgrades long-form writing discussing sensitive topics abstractly. Ham-fisted, imperfect, often-hilarious, heuristics. And yet, the alternative?

For enterprises, this python and regex layer gets replaced by a Digital Loss Prevention engine. The compute tax goes from 2ms to maybe 50ms. Still a rounding error against a 2,000ms inference call. The engine doesn’t matter. The pattern does: classify locally, annotate the response, route globally.

Everything above protects outgoing prompts, handling them as an egress problem. The user’s data headed toward an untrusted inference provider. But for enterprise MCP servers providing data to third-party agents, the same trust logic applies in reverse: it’s an ingress problem. The server must classify its own response data and verify that the requesting agent is qualified to receive it. The metadata paradigm is identical. The enforcement point moves from the client to the server.

When the MCP server receives search_contacts('Sarah Chen') and builds the response, it annotates it. Structured fields get deterministic sensitivity tags at the database level. Unstructured fields were classified at ingestion, skipping additional latency at query time.

The MCP response:

{
    "content": [{"type": "text",
        "text": "{\"name\":\"Sarah Chen\",\"email\":\"sarah@example.com\",\"interactions\":[...]}"}],
    "_meta": {
        "com.kimono/sensitivity": {
            "level": "confidential",
            "categories": ["pii", "health"],
            "allowed_regions": ["US"],
            "compliance_context": ["healthcare_workflow"]
        }
    }
}

The sensitivity metadata lives in _meta, visible to runtimes and gateways but not burned into the LLM’s context window. allowed_regions pays off the jurisdiction argument from Part 1: if you route health data to an inference provider in a jurisdiction without equivalent privacy protections, the compliance failure isn’t hypothetical.

MCP already supports free-form _meta on tool results. A namespaced extension (com.kimono/sensitivity) signals intent without requiring a spec change. If the vocabulary proves useful, it graduates to a standard result-level metadata schema.

What MCP lacks is a standardised sensitivity vocabulary and enforcement semantics: a shared way for servers, runtimes, and gateways to interpret what “confidential” means and what to do about it. Right now, every MCP server builder who handles sensitive data will build their own incompatible version until the protocol standardises the vocabulary.

That’s the worst of both worlds: the overhead of the protocol without the safety it should provide.

In a closed ecosystem (Bloomberg’s internal agents, where the enterprise controls both the MCP server and every connecting agent) the answer is straightforward. The runtime reads the sensitivity annotation, checks the provider registry, and blocks mismatches. The metadata is the enforcement because the runtime is trusted.

The primary threat in a closed ecosystem is a developer who picked the cheapest inference provider without checking jurisdiction, an agent framework that routes to whatever model responds fastest, or an intern who copied something from a sketchy website. With these failure modes, sensitivity metadata gives engineers the primitive they need.

In an open ecosystem (like Kimono), metadata isn’t enough. A regulated data custodian can’t hand clear-text PHI or other regulated personal data to an unknown agent and hope.

The trust gateway sits server-side. Because we can’t control a third-party agent’s downstream requests, the gateway acts before data crosses the wire. It checks the agent’s verified trust level against the payload’s sensitivity. Verified, not self-declared. A malicious agent will lie. The trust tier must be bound to the OAuth Client ID at registration or verified via cryptographic attestation.

If the tier is insufficient, the gateway blocks the response entirely and returns a structured trust_tier_mismatch error. Silent redaction is dangerous for autonomous agents: an LLM that expects a phone number and gets [REDACTED] hallucinates a fake one or loops. Structured rejection lets the runtime halt and trigger a step-up trust flow.

For multi-model clients where users toggle providers mid-session, static Client ID binding isn’t enough. The gateway needs session-level attestation, and the runtime must rachet the high-water mark of session sensitivity: once a session ingests confidential data, the entire context window is tainted. A subsequent benign request (”summarise the above”) routed to a cheaper, lower-tier provider would carry the confidential context with it.

The runtime can’t dynamically downgrade without wiping or locally summarising the history. MCP’s own roadmap already includes work on DPoP and Workload Identity Federation, and session-level trust binding is a natural extension of that arc.

The metadata enables the gateway, and gateway enforces the policy.

The summit opened today. The agenda is rich on authentication and enterprise patterns, thin on payload sensitivity. MCP’s 2026 roadmap has made real progress on authorisation, transport evolution, and enterprise readiness. What it still doesn’t standardise is content sensitivity.

Classification is easier than you think, and harder than you think, at the same time. For structured data (the kind that comes from databases with typed columns) classification is deterministic and free. An email column is PII. A diagnosis field is medical. The hard part is unstructured content: email bodies, message threads, free-text notes. You need regex patterns, DLP heuristics, or local SLMs. The good news: most MCP tools return structured data. Edge cases (free-text search results, conversation transcripts) are the minority, and they can be classified at ingestion rather than query time.

The registry is a political problem. Rolling my own was easy. Building a registry that Bloomberg, Morgan Stanley, Anthropic, OpenAI, and DeepSeek all accept as legitimate is about governance. Do we need the equivalent of SSL CAs for AI inference routing? The Agentic AI Foundation already governs MCP under the Linux Foundation. Maybe the same body that standardises the protocol should standardise the trust metadata.

This ain’t new. Content classification primitives are mature in enterprise DLP. Sensitivity annotations are well-understood metadata. Trust registries exist in PKI, certificate authorities, and payment processing. MCP already treats some server-supplied hints cautiously. Sensitivity metadata should inherit the same lesson. In closed ecosystems, your own server’s metadata is a trustworthy primitive. In open ones, it’s not enough by itself.

After six months, the system classifies roughly 400 requests per day. About 8% get rerouted from a lower-tier provider to a higher one. About 2% get hard-blocked. The most common trigger is the health category, usually Oura Ring data in a prompt that was headed to a provider with 30-day retention. Each one is a potential policy breach that didn’t happen.

In the final essay, I’ll step back from the implementation and propose the protocol-level changes: what the sensitivity metadata schema should look like in the MCP spec, who should govern the trust-tier registry, and what enforcement looks like at the gateway level for open ecosystems.

I’ll also return to WebMCP and the browser, where the Same-Origin Policy gives you data isolation but not data classification, and where the trust gap is, if anything, more dangerous because the user is even further from the routing decisions.

Tomorrow: Part 3, “The Guard Who Can Read.”