GitHub - agent-control-protocol/acp: Interactive demo — AI agent filling forms via ACP (Agent Control Protocol)

ACP lets AI agents control any existing application through a structured protocol -- no vision models, no DOM scraping, no guessing.

_{Click to watch: AI agent controlling a live application through ACP}

Live Demo | Specification | Reference Server | Discussions

The Problem

AI agents can access data (MCP), talk to other agents (A2A), stream events to frontends (AG-UI), and generate new UI (A2UI). But none of them allow an agent to operate an existing application's interface -- the screens, forms, and actions your users already work with.

How ACP Solves It

The application declares its UI structure through a manifest. The agent sends structured commands -- set_field, click, navigate -- and the SDK executes them against the live interface.

sequenceDiagram
    participant User
    participant App as Application + SDK
    participant Agent as Agent Engine

    App->>Agent: manifest (screens, fields, actions)
    User->>App: "Fill out the contact form"
    App->>Agent: user message
    Agent->>App: commands (set_field, click)
    App->>Agent: results (ok/fail per action)
    Agent->>App: chat "Done, form submitted."
    App->>User: response + UI updated

How ACP Compares

	ACP	MCP	A2A	AG-UI	RPA
Purpose	Agent controls existing UI	Agent accesses data/tools	Agent-to-agent coordination	Agent streams to frontend	Batch UI automation
Operates existing UI?	Yes	No	No	No (frontend must implement handlers)	Yes (via vision/DOM)
Requires vision/DOM?	No (structured manifest)	N/A	N/A	N/A	Yes
Real-time conversational?	Yes	Yes	Yes	Yes	No
Multi-platform?	Web, mobile, desktop	API-dependent	API-dependent	Web only	Platform-specific
Token cost	Low (structured data)	Low	Low	Low	High (screenshots)
Fragile to UI changes?	No (manifest-driven)	N/A	N/A	N/A	Yes

ACP is complementary to these protocols, not a replacement. Use MCP for data access, A2A for agent coordination, and ACP when the agent needs to operate application interfaces.

Getting Started

1. Run the demo

No setup required: Try the live sandbox.

Or run locally:

git clone https://github.com/agent-control-protocol/acp-demo.git
cd acp-demo && npm install
cp .env.example .env   # add your OpenAI API key
npm start              # open http://localhost:3098

Type "Register my dog Max, owner Sarah Connor, sarah@skynet.com" and watch the agent fill the form.

2. Integrate ACP into your application

Application side (SDK) -- Describe your UI as an ACP manifest:

{
  "type": "manifest",
  "app": "my-app",
  "currentScreen": "contact",
  "screens": {
    "contact": {
      "id": "contact",
      "label": "Contact Form",
      "fields": [
        { "id": "name", "type": "text", "label": "Full Name", "required": true },
        { "id": "email", "type": "email", "label": "Email", "required": true },
        { "id": "message", "type": "textarea", "label": "Message" }
      ],
      "actions": [
        { "id": "submit", "label": "Send Message" }
      ]
    }
  }
}

Connect to an ACP-compliant engine via WebSocket, send the manifest, and handle incoming commands (set_field, click, etc.) against your UI. See the demo source for a full working example.

Agent side (Engine) -- Use the reference server:

npm install @acprotocol/server

import { createACPServer } from "@acprotocol/server";

const server = createACPServer({
  port: 3099,
  openaiApiKey: process.env.OPENAI_API_KEY,
});

server.start();

The engine receives manifests, interprets user messages using the UI structure as context, and sends back commands. See the reference server docs for configuration options.

3. Understand the message flow

The user sends natural language. The agent reads the manifest, understands the UI, and responds with commands:

// Agent sends commands
{
  "type": "command",
  "seq": 1,
  "actions": [
    { "do": "set_field", "field": "name", "value": "Alice Park" },
    { "do": "set_field", "field": "email", "value": "alice@example.com" },
    { "do": "set_field", "field": "message", "value": "I need help resetting my account." },
    { "do": "click", "action": "submit" }
  ]
}

// SDK reports results
{
  "type": "result",
  "seq": 1,
  "results": [
    { "index": 0, "success": true },
    { "index": 1, "success": true },
    { "index": 2, "success": true },
    { "index": 3, "success": true }
  ]
}

What ACP Defines

8 UI Actions: navigate, set_field, clear, click, show_toast, ask_confirm, open_modal, close_modal
15 Field Types: text, number, currency, date, datetime, email, phone, masked, select, autocomplete, checkbox, radio, textarea, file, hidden
Manifest Structure: screens, fields, actions, modals -- everything the agent needs to understand the application's UI and current state
Command-Result Loop: the agent sends commands with sequence IDs; the SDK reports success or failure per action, enabling reliable multi-step workflows
Streaming: token-by-token chat responses for real-time UX alongside command execution

Why Not Vision/Scraping?

Vision-based approaches (screenshot analysis, pixel coordinates) are slow, expensive in tokens, and fragile across resolutions and themes. A single UI redesign breaks everything.
DOM scraping couples the agent to implementation details that change on every deploy. It does not work on native mobile or desktop applications at all.
RPA tools are heavyweight, enterprise-only, and designed for batch automation -- not real-time conversational interaction.
ACP: the application declares its own structure. The agent operates with certainty, not heuristics. Works on any platform -- web, mobile, desktop -- because the SDK mediates between the protocol and the native UI layer.

Protocol, Not Product

ACP is a protocol specification, not a product. Anyone can implement an ACP-compliant engine (the agent side) or SDK (the application side). The protocol defines the contract between them.

The first production implementation is Vocall Engine by Primoia, which powers ACP alongside voice interaction.

Specification

Document	Description
`spec/acp-v2.json`	JSON Schema for all ACP message types
`spec/SPEC.md`	Formal specification (message lifecycle, error handling, sequencing)
`examples/`	Annotated example message exchanges
`conformance/`	Conformance test suite for validating implementations

Implementations

Implementation	Type	Platform	Status
Vocall Engine by Primoia	Server	Go	Production
vocall_sdk by Primoia	SDK	Flutter	Production
vocall-react by Primoia	SDK	React / Next.js	Production
`@acprotocol/server`	Server (Reference)	TypeScript	Beta
acp-demo	Interactive Demo	TypeScript	Beta

Building an ACP implementation? Open a PR to add it to this table.

Extensions

The core protocol handles text interaction and UI control. Implementations MAY extend the protocol to support additional modalities such as voice interaction, haptic feedback, or accessibility features. Extensions should be namespaced to avoid conflicts with future protocol versions.

Community

GitHub Discussions — Questions, ideas, and general discussion
Issue Tracker — Bug reports and feature requests
Code of Conduct
Security Policy

Contributing

See CONTRIBUTING.md for guidelines on proposing changes, reporting issues, and submitting implementations.

License

Apache 2.0 -- see LICENSE for details.