ACP
Agent Control Protocol
An open protocol for AI agents to operate existing application user interfaces.
The Missing Protocol
| MCP | (Anthropic) | --> | LLM ↔ Data / Tools |
| A2A | (Google) | --> | Agent ↔ Agent |
| AG-UI | (CopilotKit) | --> | Agent → Frontend streaming |
| A2UI | (Google) | --> | Agent → Generated UI |
| ACP | --> | Agent ↔ Existing Application UI |
Existing protocols let agents access data, coordinate with other agents, stream events to frontends, and generate new UI components. None of them allow an agent to operate an existing application's interface. ACP fills this gap.
Three Steps
Step 1
Describe
Your app sends a manifest describing its screens, fields, actions, and modals. This is the agent's map of the interface.
Step 2
Converse
The user sends natural language. The agent interprets intent using the manifest, knowing exactly what fields and actions are available.
Step 3
Execute
The agent sends commands -- set_field, click, navigate. The SDK executes them against the live UI and reports results back.
See It In Action
Three agents (Gemini, DeepSeek, Haiku) completing a full task in parallel using ACP.
1. Application sends manifest
{ "type": "manifest", "app": "contact-portal", "currentScreen": "contact", "screens": { "contact": { "id": "contact", "label": "Contact Form", "fields": [ { "id": "name", "type": "text", "label": "Full Name", "required": true }, { "id": "email", "type": "email", "label": "Email", "required": true }, { "id": "message", "type": "textarea", "label": "Message" } ], "actions": [ { "id": "submit", "label": "Send Message" } ] } } }
2. Agent responds with commands
{ "type": "command", "seq": 1, "actions": [ { "do": "set_field", "field": "name", "value": "Alice Park" }, { "do": "set_field", "field": "email", "value": "[email protected]" }, { "do": "set_field", "field": "message", "value": "Hello, I need help resetting my account." }, { "do": "click", "action": "submit" } ] }
3. SDK reports results
{ "type": "result", "seq": 1, "results": [ { "index": 0, "success": true }, { "index": 1, "success": true }, { "index": 2, "success": true }, { "index": 3, "success": true } ] }
Protocol At A Glance
8 UI Actions
navigate, set_field, clear, click, show_toast, ask_confirm, open_modal, close_modal
15 Field Types
text, number, currency, date, datetime, email, phone, masked, select, autocomplete, checkbox, radio, textarea, file, hidden
Manifest Structure
Screens, fields, actions, modals -- everything the agent needs to understand the application's UI and its current state.
Command-Result Loop
The agent sends commands with sequence IDs. The SDK reports success or failure per action, enabling reliable multi-step workflows.
Structured, Not Scraped
- Vision Screenshot analysis is slow, expensive in tokens, and fragile across resolutions and themes. A single UI redesign breaks everything.
- DOM Scraping Couples the agent to implementation details that change every deploy. Does not work on native mobile or desktop at all.
- RPA Heavyweight, enterprise-only, batch-oriented. Not designed for real-time conversational interaction.
- ACP The application declares its own structure. The agent operates with certainty, not heuristics. Works on any platform -- web, mobile, desktop -- because the SDK mediates between the protocol and the native UI layer.
Implementations
| Implementation | Type | Platform | Status |
|---|---|---|---|
| Vocall Engine by Primoia | Server | Go | Production |
| vocall_sdk by Primoia | SDK | Flutter | Production |
| vocall-react by Primoia | SDK | React / Next.js | Production |
The spec and conformance tests are all you need to build your own implementation. List yours here.