Published March 14, 2026 | Version v1
Preprint Open
Description
AI coding agents grant large language models access to file systems, terminals, and external services through protocols such as the Model Context Protocol (MCP). The trust models governing that access were designed for human users, not autonomous agents processing attacker-controlled input. This paper presents three empirical findings in Anthropic’s Claude Code (v2.1.63) demonstrating systemic trust boundary failures in MCP server configuration handling, tool confirmation prompts, and workspace trust escalation. All findings were reported through Anthropic’s HackerOne Vulnerability Disclosure Program and closed as Informative. Rather than contesting that design decision, this paper reframes the findings from an enterprise defensive perspective and proposes compensating controls including virtual desktop infrastructure (VDI) isolation, MCP configuration integrity monitoring, and credential management practices adapted for AI-assisted development workflows.
Files
Trust_Boundary_Failures_in_AI_Coding_Agents.pdf
Files (208.6 kB)