Playwriter is an open-source browser automation tool that enables AI agents to control a user's existing Chrome browser instance by executing Playwright code within a secure, stateful sandbox. It combines a Chrome extension, a command-line interface (CLI), and Multi-Control Protocol (MCP) integration to provide full, unrestricted access to the Playwright API—including network interception, log capture, locator inspection, node removal, debugging, and profiling—while ensuring security through explicit user consent, visible automation indicators, and isolated script execution. Unlike traditional headless browser automation, Playwriter operates on the user's visible browser session to avoid common detection mechanisms and reduce resource usage by eliminating the need for separate browser instances.[1][2]
The tool's architecture centers on a Chrome extension that connects to user-approved tabs (indicated by a green icon upon activation) and uses the chrome.debugger API to enable control. A local WebSocket server running on localhost:19988 facilitates communication between the extension, the CLI, and MCP clients, ensuring all operations remain local and restricted to authorized connections from the official extension. Scripts execute in a stateful sandbox where variables (such as state) persist across executions, while browser tabs are shared but maintain isolated script states. Security features include origin validation, explicit per-tab consent, and a visible Chrome automation banner on controlled tabs to prevent unauthorized or hidden automation.[1][3]
Playwriter distinguishes itself through advanced capabilities tailored for AI agent compatibility, including Vimium-style visual element labels (color-coded for links, buttons, inputs, checkboxes, and more) and screenshotWithAccessibilityLabels functionality that generates screenshots paired with accessibility snapshots using aria-ref selectors for reliable element targeting. It supports remote CLI usage (e.g., from devcontainers, VMs, or SSH) via host-token authentication, allowing agents to execute commands like playwriter -s 1 -e "await page.goto('https://example.com')" or multiline scripts. Full Playwright API exposure enables sophisticated operations, such as network request interception (page.on('response', ...)), console log capture, live code editing via state.editor.edit(), and CDP-based debugging with breakpoints.[1][2]
The project is hosted on GitHub under the MIT license, developed by remorses, and distributed via the Chrome Web Store. It emphasizes agent-friendly design with low context bloat, extensive command support, and seamless integration into AI workflows through MCP clients.[1][3]
Overview
Description
Playwriter is an open-source browser automation tool that enables AI agents to control a user's existing Chrome browser instance by executing Playwright code snippets. It combines a Chrome extension, a command-line interface (CLI), and Multi-Control Protocol (MCP) support to facilitate this control, allowing agents to automate tasks directly within the user's familiar browser environment rather than launching isolated instances.[1][3]
The primary purpose of Playwriter is to provide AI agents with unrestricted access to the full Playwright API while maintaining a secure, stateful sandbox environment that preserves session state, logged-in credentials, and installed extensions across executions. This design supports complex, context-aware automation by running code against the user's live browser tabs, with explicit user consent required for tab activation.[2][3]
Playwriter emphasizes low context overhead for AI agents through a single execute tool that sends Playwright snippets, enabling efficient integration with agent frameworks while offering comprehensive capabilities such as network interception, debugging, and element inspection. The tool operates locally via a WebSocket server, ensuring security through origin validation, visible automation indicators, and no remote access by default.[2][3]
History and Development
Playwriter was developed as an open-source project to enable AI agents to control users' existing Chrome browser sessions through Playwright code executed in a secure, stateful sandbox, addressing limitations in tools like Playwright MCP that spawn separate browser instances and lack support for existing extensions or user workflows.[1][2] The project originated in late 2025 under the GitHub username "remorses" (associated with developer Tommaso De Rossi), with initial development activity focused on establishing core infrastructure, including CI integration starting around November 16, 2025.[1] Early efforts emphasized building a Chrome extension that communicates via CDP and MCP, allowing arbitrary Playwright snippets to run sandboxed while connecting to the user's live browser rather than launching a new one.[1][4] Public introduction occurred via a Show HN post on Hacker News in November 2025, where the tool was positioned as an improvement over alternatives by reducing context bloat, enabling human-AI collaboration in the same browser window, and providing unrestricted Playwright API access (including network interception and debugging) without disrupting normal usage.[4] The Chrome extension was published on the Chrome Web Store, with ongoing development adding features such as enhanced logging, CLI commands, and build processes through early 2026.[3] The project formalized its open-source status with an MIT license on January 12, 2026, and continued refinements to the sandbox design and MCP integration to prioritize agent compatibility, including remote CLI support and visual element handling.[1] This evolution reflected a deliberate shift toward full Playwright API exposure in a user-centric, stateful environment while maintaining security isolation.[2]
Key Distinctions from Other Tools
Playwriter distinguishes itself from other browser automation tools and multi-control platforms (MCPs) by granting full, unrestricted access to the Playwright API, enabling capabilities such as network request interception, log capture, locator inspection, node removal, debugging, and profiling that are often limited or unavailable in competing solutions.[2] Unlike many alternatives that launch separate browser instances, Playwriter operates directly on the user's existing Chrome browser instance through a dedicated Chrome extension, allowing automation within the same tabs, extensions, and browsing context the user already has open.[2] The tool incorporates built-in visual element labels—Vimium-style accessibility cues with color-coding (such as yellow for links and orange for buttons)—to facilitate reliable element identification and interaction by AI agents, alongside remote CLI support that enables operation from different machines or environments while controlling the host browser.[2] These design choices prioritize agent compatibility and seamless integration with the user's real browsing environment while preserving security through a stateful sandbox.[2]
Architecture
Stateful Sandbox Environment
The stateful sandbox environment in Playwriter executes Playwright code snippets within a persistent, isolated context that maintains browser state across multiple commands. Unlike stateless automation tools that reset or spawn new browser instances for each operation, Playwriter operates on the user's existing Chrome instance.[1]
Sessions are created with unique identifiers and preserve state through a dedicated state object that allows variables and data to persist between successive code executions within the same session, enabling sequential operations without losing context. For example, data collected in one command, such as extracted page elements or intercepted requests, remains accessible in subsequent commands executed in the same session.[1]
Browser state—including cookies, local and session storage, DOM content, and open tabs—is preserved across agent commands because the sandbox operates directly on the user's existing Chrome instance rather than launching isolated or headless browsers, allowing continuity in logged-in sessions or multi-step workflows. Browser tabs are shared across sessions, so browser-level state persists across sessions as well. Sessions remain active until explicitly reset (or the browser/server is closed).[1]
Isolation is enforced at the session level, with each session maintaining its own independent state while sharing the underlying browser tabs. While the state object prevents direct cross-session data interference, shared tabs may allow actions in one session to affect others (e.g., via navigation or DOM changes). To avoid such interference, users can create dedicated pages per session (e.g., state.myPage = await context.newPage()).[1]
Security boundaries include restricting control to only those tabs where the Playwriter extension icon has been activated, enforcing origin validation to allow connections solely from the authorized extension, operating the WebSocket server exclusively on localhost:19988, and blocking remote access to prevent external interference or malicious websites from connecting. These measures ensure that the sandbox provides a controlled, local-only execution environment without exposing the browser to unauthorized external control.[1]
Playwright Integration
Playwriter integrates Playwright as the core automation engine, executing JavaScript snippets that directly invoke the Playwright API within a secure, stateful sandbox executed in a Node.js environment that controls the user's existing Chrome browser instance via the Chrome DevTools Protocol (CDP).[1][2]
The integration provides full, unrestricted access to the Playwright API surface, exposing standard objects such as page, context, Node.js globals, and the require function, without the restricted wrappers or limited method sets common in other agent-compatible automation tools.[2]
Snippets are executed via the command-line interface using the -e flag or through the multi-control platform (MCP) interface, with each invocation tied to a session ID that maintains isolated execution state; variables and browser context persist across multiple calls through a dedicated state object.[2]
This execution model relies on a local WebSocket server that relays Playwright commands from agents or the CLI to the Chrome extension, enabling programmatic control via Playwright's connectOverCDP method while preserving the full API capabilities.[2]
Agents typically generate Playwright code snippets that are then submitted for execution, allowing dynamic browser interaction without intermediary translation layers beyond direct code submission.[1]
The sandbox maintains stateful persistence of browser context across executions, as detailed in the Stateful Sandbox Environment section.
Security and Isolation Model
Playwriter employs a security and isolation model designed to safely enable AI agents to control a user's existing Chrome browser instance while minimizing risks from untrusted code execution.[1] At the core is a stateful sandbox that executes Playwright code snippets. Each automation session operates with its own isolated state, ensuring that variables, page objects, and execution context do not interfere across sessions, even though browser tabs themselves may be shared.[1][5] Access to browser control is strictly gated by user consent. Automation is permitted only on tabs where the user has actively clicked the Playwriter extension icon, which visually indicates connection status (turning green when active). This explicit consent mechanism prevents unauthorized or unintended browser manipulation.[1][5] Chrome itself enforces visibility of automation through a persistent banner displayed on controlled tabs, alerting the user to active scripting and aiding in detection of unexpected behavior.[1][5] Network communication is restricted to localhost only. Playwriter operates a WebSocket server bound to localhost:19988, preventing external connections. Origin validation further enforces that only requests from the authorized Playwriter extension IDs are accepted, blocking origin-spoofing attempts by malicious websites. These measures ensure no remote access is possible.[1][5] Collectively, these controls—sandboxed execution with session isolation, user-initiated consent, visible automation indicators, local-only networking, and strict origin checks—mitigate the risks of malicious or erroneous agent commands while still allowing full use of the Playwright API within the constrained environment.[1][5]
Core Capabilities
Browser Control Primitives
Playwriter exposes fundamental browser control primitives through full access to the Playwright API, enabling direct execution of essential automation commands within its stateful sandbox environment.[1] These primitives form the foundation for interacting with web pages, including navigation, element interaction, screenshot capture, and JavaScript evaluation on the page.[1]
Navigation is performed using the page.goto method to load a specified URL. For example:
This command directs the controlled browser tab to the target page while maintaining session state.[1]
Element interactions rely on Playwright's locator system, supporting basic strategies such as CSS selectors, text-based matching, and ARIA roles. Clicking elements is accomplished with methods like page.click or locator.click. Representative examples include:
or
These approaches allow precise targeting of interactive elements on the page.[1]
Screenshot capture uses the page.screenshot method to generate images of the current page state, for instance:
A specialized variant, screenshotWithAccessibilityLabels, provides screenshots accompanied by accessibility metadata to support element identification.[1]
Page evaluation executes arbitrary JavaScript within the browser context via methods such as page.$$eval or page.evaluate, enabling data extraction and manipulation. An example extracts text content from multiple elements:
This permits programmatic inspection and processing of page DOM content.[1]
Playwriter's visual element labeling system, which generates color-coded accessibility references (such as aria-ref) for locators, complements these primitives by facilitating reliable element targeting in agent-driven scenarios (detailed further in the Visual Element Labeling section).[1]
Locator and Element Inspection
Playwriter provides comprehensive support for locator creation and element inspection through its unrestricted access to the Playwright API, enabling precise querying and analysis of DOM elements in the controlled browser instance. Locators serve as the primary mechanism for finding elements, supporting a wide range of selector strategies including CSS selectors, XPath expressions, text matching, ARIA roles, placeholder text, and label associations to facilitate query-based element selection without relying on brittle identifiers.[6]
Locators in Playwriter support chaining, allowing developers or agents to refine selections progressively—for instance, starting from a container element and narrowing to specific descendants or applying additional filters such as :has-text or :visible. This chaining approach enhances locator discovery by combining multiple conditions into robust, maintainable queries that adapt to dynamic page structures.[7]
Once resolved, locators enable detailed inspection of element properties. Users can retrieve the bounding box to obtain coordinates, width, and height for positional analysis; extract visible text content using innerText() or raw text via textContent(); query individual attributes with getAttribute(); and capture the accessibility tree snapshot to examine semantic information, roles, names, and states for accessibility-focused inspection. These capabilities support thorough element analysis directly within the sandboxed Playwright execution environment.[7]
Playwriter's implementation preserves Playwright's auto-waiting behavior for locators, ensuring reliable resolution even amid asynchronous page updates or animations. For AI agent use cases, these inspection tools allow dynamic element discovery and evaluation. Visual element labeling is also available to enhance agent compatibility, as detailed in the Visual Element Labeling section.
Visual Element Labeling
Visual Element Labeling
Playwriter implements a visual element labeling system inspired by the Vimium browser extension. It generates screenshots of the page with short, color-coded labels overlaid on interactive elements to facilitate identification and interaction by AI agents. These labels appear as alphanumeric tags (such as "e5") on links, buttons, form inputs, and other actionable components in the screenshot, with distinct colors assigned to different element types: yellow for links, orange for buttons, coral for inputs, pink for checkboxes, peach for sliders, salmon for menus, and amber for tabs. This visual annotation scheme in screenshots provides an intuitive representation that reflects the current page state, enabling agents to reference elements more reliably than through text descriptions or coordinates alone.[2]
The labels are generated dynamically through the screenshotWithAccessibilityLabels function, which captures a screenshot of the page (with overlaid labels) and an accompanying accessibility snapshot enriched with aria-ref selectors corresponding to each labeled element. These aria-ref identifiers (e.g., aria-ref=e5) serve as stable programmatic handles that agents can use in conjunction with the visual cues in the screenshot. Labels are regenerated each time the function is invoked, ensuring they reflect any changes to the page structure or content resulting from navigation, dynamic loading, or user interactions.[2]
This labeling approach significantly enhances agent reliability by reducing ambiguity in element targeting, particularly on complex or visually dense pages where traditional selectors might prove brittle. By combining visual annotations in screenshots with accessible programmatic references, Playwriter allows AI agents to make more informed decisions during automation tasks, minimizing errors that arise from misinterpreting textual descriptions or DOM positions alone. The system complements locator inspection mechanisms (detailed in the Locator and Element Inspection section) by providing a visual layer in screenshots that bridges human-readable cues and automated control.[2][1]
Advanced Features
Network Request Interception
Playwriter supports advanced network request interception through its full access to the Playwright API, enabling users and AI agents to monitor, intercept, modify, and simulate network traffic within controlled browser tabs. This capability is executed via Playwright code snippets run in the stateful sandbox, allowing precise manipulation of HTTP requests and responses without requiring separate proxy tools or browser launches.[1] Agents can attach event listeners to capture network activity, such as using page.on('response') to monitor responses and collect details like URLs from API calls. For instance, code can push matching response URLs into a persistent state variable for later logging or analysis, or combine it with page.waitForResponse() to synchronize actions with specific network events.[1] Building on Playwright's route API (detailed in the Playwright Integration section), Playwriter permits registration of interception handlers via page.route() for URLs or predicates. Within these handlers, agents can invoke route.abort() to block requests (e.g., to skip loading images or trackers), route.continue() to allow normal processing, or route.fulfill() to supply custom response bodies, headers, and status codes. This enables practical use cases such as mocking API endpoints for frontend testing independent of backend availability, simulating failure modes or latency, and isolating components during development or automated verification.[1]
Logging and Debugging
Playwriter provides comprehensive logging and debugging support, leveraging its integration with the Chrome DevTools Protocol (CDP) and direct console output handling to enable effective troubleshooting during browser automation sessions.
Console logs from executed Playwright snippets are forwarded in real time to the command-line interface when running commands via the CLI. For example, snippets containing console.log statements—such as playwriter -e "console.log(await page.title())" or playwriter -e "console.log(state.users)"—display their output immediately in the terminal, facilitating inspection of page state, variables, or computation results without additional tooling.[1]
Error handling includes capture of relevant details in relay server logs, which encompass extension, MCP, and WebSocket server activity along with CDP events. These logs, along with a companion CDP JSONL file containing all protocol interactions, can be accessed using the playwriter logfile command to retrieve the file path (typically in /tmp/playwriter/ on Linux/macOS). This mechanism supports analysis of errors encountered during execution, including those arising in sandboxed Playwright code.[1]
Debug mode is activated programmatically through CDP by establishing a debugger session in executed snippets, such as state.cdp = await getCDPSession({ page }); state.dbg = createDebugger({ cdp: state.cdp }); await state.dbg.enable(). Once enabled, this allows listing scripts, setting breakpoints (e.g., await state.dbg.setBreakpoint({ file: url, line: number })), and performing step-by-step inspection for detailed diagnosis.[1]
In contexts involving AI agents and MCP usage, logging in core components like mcp.ts is restricted to console.error to prevent output clutter, with debugging often relying on relay log inspection, CDP JSONL analysis, or test case execution.[8]
Page Modification and Profiling
Playwriter provides advanced capabilities for modifying page content and profiling performance through direct access to the Playwright API and Chrome DevTools Protocol (CDP) integration within its stateful sandbox environment. These features enable JavaScript execution, DOM manipulation, live code editing, and debugging without custom prefixed tools. Page modification occurs primarily through Playwright API calls in executed scripts. Scripts can use methods like page.evaluate to run custom JavaScript for DOM alterations, such as node removal, style changes, or content updates. Element interactions, including form filling and typing, are supported via standard Playwright methods like page.fill and page.type on locators. Live code editing is available via CDP utilities, allowing dynamic changes to page JavaScript files—for example, using state.editor.edit to replace strings in scripts. Scripts execute in a persistent state where variables like state maintain data across calls, enabling setup code in early executions to initialize or modify page behavior.[2] Profiling capabilities include CDP-based debugging and logging. Debugging supports setting breakpoints, listing scripts, and inspecting execution via utilities like createDebugger and setBreakpoint. CDP JSONL logs capture detailed protocol traffic for performance analysis, viewable via playwriter logfile. Screenshots are captured using page.screenshot or the custom screenshotWithAccessibilityLabels function, which pairs images with accessibility data using aria-ref selectors for element identification. Accessibility snapshots are available via accessibilitySnapshot({ page }), providing structured DOM and accessibility tree representations for analysis. No native video recording is supported.[2]
Integration and Interfaces
Chrome Extension Interface
The Chrome extension serves as the primary user interface for Playwriter, enabling secure and explicit control over individual browser tabs. Users install the extension from the Chrome Web Store and activate it by clicking the extension icon in the toolbar on a desired tab, which turns the icon green to indicate successful connection and control of that tab.[2]
This click grants explicit consent for automation, ensuring that only tabs where the user has actively engaged the extension are controllable, while unclicked tabs remain disconnected and appear gray in visual representations. Chrome displays a visible automation banner on controlled tabs to clearly signal active automation.[2]
The extension's background service worker handles persistent operations, utilizing the Chrome Debugger API (chrome.debugger) to attach to the selected tab and facilitate low-level browser interaction. It communicates with a local WebSocket relay server running on localhost:19988, which acts as the intermediary for relaying Playwright commands and responses between the browser tab and the external execution environment.[2]
This architecture ensures that the extension maintains a secure, tab-specific bridge without exposing uncontrolled tabs, with all communication confined to localhost for local-only operation.[2]
Command-Line Interface (CLI)
The Command-Line Interface (CLI) of Playwriter provides a terminal-based method to execute Playwright scripts in a stateful sandbox connected to an existing Chrome browser instance through the Playwriter Chrome extension. It enables programmatic control over browser tabs without spawning a new browser, preserving user sessions, extensions, and logged-in states. The CLI requires the extension to be installed and active on target tabs, with a WebSocket connection established on localhost:19988 by default.[2]
Installation of the CLI is performed globally via npm with npm i -g playwriter.[2]
Session management is central to CLI operation. A new stateful sandbox session is created with playwriter session new, which outputs a unique session ID (such as 1). Active sessions are listed with playwriter session list, displaying session IDs and associated state keys. Sessions can be reset to resolve connection issues using playwriter session reset <id>.[2]
Playwright scripts are executed within a specified session using the syntax playwriter -s <session_id> -e "<script>", where -s specifies the session ID and -e provides the JavaScript code to run in the sandbox context. The sandbox exposes variables such as page, context, state (persistent across calls), require, and Node.js globals. For example:
Multiline scripts are supported using shell heredoc-style syntax with single quotes and a dollar sign:
Custom pages can be created within sessions to avoid interference:
[2]
The CLI supports remote execution by running a server on the host machine and connecting from another. On the host, start the server with playwriter serve --token <secret>. From a remote machine, use --host and --token flags:
Environment variables PLAYWRITER_HOST and PLAYWRITER_TOKEN can alternatively be set to avoid passing flags repeatedly.[2]
The CLI can be invoked via npx for one-off or server usage without global installation, such as npx -y playwriter serve --host [127.0.0.1](/page/Loopback) to start a server on a specific host.[2]
AI Agent and MCP Connectivity
Playwriter provides seamless integration with AI agents and serves as a multi-control platform (MCP) by exposing a standardized interface that allows large language models and agents to control an existing Chrome browser instance through Playwright code executed in a secure, stateful sandbox.[1]
The primary mechanism for AI agent connectivity is the MCP interface, which uses a local WebSocket server running on localhost:19988. The Chrome extension communicates with this server over WebSocket endpoints (/extension and /cdp/:id), while external MCP clients (AI agents) connect to the same server to send commands. This architecture enables agents to issue Playwright instructions without needing direct access to the browser process, ensuring isolation and security.[1]
The MCP exposes a minimal, powerful set of tools:
execute: The main tool that accepts arbitrary Playwright code snippets to run in the sandboxed environment. This provides full, unrestricted access to the Playwright API (including network interception, log capture, locator inspection, node removal, debugging, and profiling). The sandbox maintains stateful variables (page,context,state) across multipleexecutecalls within the same session.[1]reset: Re-establishes the connection if issues arise during operation.[9]
To integrate Playwriter with an AI agent, the recommended and simplest method is to add it as a skill using the command:
This automatically configures the MCP server and makes the execute and reset tools available to the agent. Playwriter is compatible with any MCP-compliant client.[1]
For remote agents (such as those running in devcontainers, VMs, or over SSH), Playwriter supports a token-based authentication handshake:
- On the host machine (where Chrome and the extension run):
- On the remote machine (where the AI agent runs), configure the MCP client with the host address and token, either via command-line arguments: or via environment variables:
This setup allows the remote AI agent to securely drive the host browser.[1]
When PLAYWRITER_AUTO_ENABLE=1 is set (default in CLI mode), a blank tab is automatically created and enabled for control upon connection, simplifying agent workflows. Visual element labeling (detailed in the Visual Element Labeling section) further assists agents by providing identifiable aria-ref selectors for interaction.[1]
Usage and Examples
Installation and Basic Setup
Playwriter is installed via its Chrome extension and a companion command-line interface (CLI), enabling control of an existing browser instance through Playwright code in a secure sandbox. The extension provides the primary interface for connecting to tabs, while the CLI allows execution of commands and session management.
The Chrome extension, titled Playwriter MCP, can be loaded unpacked from the GitHub repository source code or installed from the Chrome Web Store when available. Users add it directly from the store page when available, where it requests standard permissions for tab interaction and debugging. It can also be loaded unpacked for development. After installation, basic activation involves clicking the extension icon on any desired tab; the icon turns green to confirm connection. This step ensures explicit user consent—only tabs where the user has manually activated the extension are controllable—and Chrome displays a visible automation banner on affected tabs.[3][1]
The CLI component can be used via npx (e.g., npx playwriter) or installed globally if the package is available using npm i -g playwriter. This requires Node.js and npm to be installed on the system. Once available, the CLI becomes accessible as the playwriter command for session creation and management.[1]
For first-run configuration, after setting up both components, users create an initial session with the command playwriter session new. This generates a stateful sandbox environment and outputs a unique session ID (for example, 1) that is used to reference the session in subsequent CLI interactions. No additional configuration files or environment variables are required for basic local use, though remote access can be enabled separately via the playwriter serve command with a token.[1]
Running the Skill Demonstration
The Playwriter skill demonstration is executed via the CLI to illustrate the tool's core functionality: executing Playwright code snippets in a stateful, secure sandbox connected to an existing Chrome browser instance through the extension.[2]
To run the demonstration, first install the Playwriter MCP Chrome extension from the Chrome Web Store and connect it to a tab by clicking the extension icon (the icon turns green when connected).[3] Install the Playwriter CLI globally with npm i -g playwriter.[2]
Create a new session with the command playwriter session new, which initializes a stateful sandbox and outputs a session ID (typically a number such as 1).[2]
Execute a basic Playwright command in the session by specifying the session ID with the -s flag and the code with the -e flag, for example: playwriter -s 1 -e "await page.goto('[https://example.com](/page/Example.com)')". This navigates the connected browser tab to the specified URL.[2]
To verify the action and observe output, run playwriter -s 1 -e "console.log(await page.title())". The command returns the page title (e.g., "Example Domain") in the terminal, confirming successful execution of Playwright code and state persistence across commands.[2]
The demonstration further highlights Playwriter-specific features by using playwriter -s 1 -e "console.log(await accessibilitySnapshot({ page }))", which outputs a structured accessibility tree of the page, and playwriter -s 1 -e "await page.locator('aria-ref=e5').click()", which clicks an element using an ARIA reference derived from the extension's visual labels.[2]
These steps showcase unrestricted Playwright API access (including page navigation, accessibility inspection, and locator usage) in a sandboxed environment, with visual element labels overlaid on the page for AI compatibility, while preserving the user's existing browser state and extensions.[2][1]
Agent-Driven Browser Control Examples
Playwriter enables AI agents to drive browser interactions by executing Playwright code snippets within a stateful sandbox connected to the user's existing Chrome instance via the extension. This approach supports progressively complex automation tasks, leveraging full Playwright API access alongside visual accessibility labels (such as color-coded aria-ref identifiers) that aid agents in reliably locating and interacting with elements without relying solely on screenshots.[1][10] For simple navigation and interaction, an agent can execute basic Playwright commands to visit a page and engage with an element identified by its accessibility label. A representative example involves navigating to a target URL and clicking a labeled button:
Here, e5 refers to a dynamically assigned label (e.g., yellow for links, orange for buttons) overlaid on the element for agent-friendly identification.[1]
Multi-step tasks build on session state persistence, allowing agents to extract data in one execution and reuse it in subsequent ones without reinitializing the context. For instance, an agent might extract user information from a page and store it for later processing:
A follow-up execution can then access and manipulate the stored data:
This stateful pattern supports workflows such as form filling after data extraction or conditional actions based on prior results.[1] More advanced scenarios incorporate network interception for monitoring or validation. An agent can attach listeners to capture API responses during interactions:
followed by triggering an action and inspecting captured data:
Such patterns enable agents to verify backend calls or handle authenticated flows in multi-step processes.[1] Agents can also request screenshots annotated with accessibility labels to inform decisions, particularly for vision-enabled models. The following code captures the current page with overlaid labels:
The agent can then analyze the labeled screenshot and issue subsequent commands using aria-ref locators, improving reliability for dynamic or visually complex interfaces.[1] Error handling in agent flows typically involves incorporating try-catch blocks within Playwright snippets to manage exceptions locally, or leveraging debugging tools such as CDP sessions for breakpoint setting and state inspection when failures occur. Agents can reference session logs or reset states to recover from issues while maintaining the overall workflow.[1]
Comparison with Alternatives
Differences from Other MCPs
Playwriter differentiates itself from other multi-control platforms (MCPs), particularly the official Playwright MCP server, by operating directly on the user's existing Chrome browser instance through a Chrome extension rather than spawning a new, isolated Chrome process for each task. This enables stateful control of the user's real browser, preserving logged-in sessions, cookies, extensions, and tab states across operations, while allowing seamless collaboration between the user and AI agents in the same browser window.[2][1]
In contrast to many MCPs that expose limited actions through multiple dedicated tools, Playwriter grants unrestricted access to the full Playwright API within a stateful sandbox. This includes advanced capabilities such as network request interception (e.g., page.on('response', ...)), log capture via relay and CDP JSONL logs, locator inspection through accessibility snapshots, node removal or live code editing (e.g., state.editor.edit(...)), debugging with breakpoints (e.g., state.dbg.setBreakpoint(...)), and profiling. All functionality is delivered via a single execute tool that runs arbitrary Playwright code snippets, minimizing context bloat and token consumption for AI agents compared to multi-tool approaches.[2][1]
Playwriter further provides visual element labels in a Vimium-style system, assigning color-coded accessibility markers to interactive elements (yellow for links, orange for buttons, coral for inputs, etc.) and using aria-ref selectors (e.g., aria-ref=e5) for reliable identification. Screenshots can include these labels, aiding agent navigation and understanding. This contrasts with text-only or raw accessibility-tree interfaces common in other MCPs.[5]
Playwriter also supports remote CLI execution over WebSocket, allowing agents or developers to control a host browser from a separate environment (e.g., devcontainer or VM) using token-based authentication, while maintaining the same extension-driven, stateful interaction model.[2]
Advantages over Standard Playwright Scripts
Playwriter offers several advantages over running standard Playwright scripts, which typically involve launching a new browser instance (often headless) for each automation task.
Playwriter controls the user's existing Chrome browser instance through a Chrome extension, rather than spawning a new browser window or process. This enables persistent sessions that preserve the user's logged-in state, cookies, extensions, and open tabs across multiple operations. Standard Playwright scripts, by contrast, generally create isolated, fresh browser contexts that do not inherit the user's live environment unless explicitly configured to do so.[1][2]
Each Playwriter session runs in a stateful sandbox where variables such as state, page, and context persist between executions within the same session ID. This allows agents to maintain and build upon data (for example, storing intercepted network requests or extracted page elements) without re-initializing the environment on every call. Standard Playwright scripts lack this built-in session persistence and require manual state management across separate invocations.[1][2]
Playwriter provides an agent-friendly interface centered on a single execute tool that grants full access to the Playwright API, including advanced capabilities such as network interception, log capture, breakpoints, live code editing, and raw CDP access. Agents can write and run Playwright code directly without learning a large set of specialized tools or dealing with high context overhead. Raw Node.js Playwright scripts require full programmatic setup and lack this streamlined, single-invocation model tailored for agent workflows.[1][2]
Security is enhanced through a controlled sandbox that enforces several safeguards: the WebSocket server operates only on localhost:19988, origin validation restricts connections to the Playwriter extension, control requires explicit user consent (by clicking the extension icon on a tab), and an automation banner is displayed in Chrome on controlled tabs. These measures reduce risks associated with direct browser control in standard Playwright scripts, where automation may occur without built-in visibility or consent mechanisms.[1][2]
Because Playwriter operates on the user's existing browser, it can potentially reduce bot detection compared with standard Playwright scripts that launch a new, isolated instance often flagged as automated.[1][2]
Documentation and Community
Official Documentation
The official documentation for Playwriter is hosted on its GitHub repository at https://github.com/remorses/playwriter, where the README.md file serves as the primary and comprehensive source of information.[1] The documentation structure begins with an overview of the tool's core features, including the Chrome extension for connecting to existing browser tabs, the CLI for local execution, and the MCP interface for AI agent compatibility, emphasizing the stateful sandbox that executes Playwright code with unrestricted access to the full Playwright API, including network interception, log capture, locator inspection, node removal, debugging, and profiling capabilities.[2] It highlights the single execute tool as the primary mechanism for agents to send Playwright snippets or CDP commands, with explanations of how visual element labels and remote CLI support enhance agent compatibility without context bloat.[2] No separate API reference is provided within the Playwriter documentation. Since Playwriter exposes the complete Playwright API surface, users can refer to the official Playwright API documentation for detailed method signatures, classes, and usage patterns.[11] Tutorials and examples are organized directly within the README.md, focusing on practical agent-driven usage scenarios, tool call formats, and configurations for different agent frameworks, with additional supplementary guidance in files such as PLAYWRITER_AGENTS.md for agent-specific workflows and development.[12] The documentation briefly references the skill demonstration as a showcase of core capabilities, with further details available in dedicated sections.
Community Resources and Support
The primary source of community resources and support for Playwriter is its official GitHub repository at https://github.com/remorses/playwriter.[](https://github.com/remorses/playwriter) Users can report bugs, request features, ask questions, and track ongoing development through the issue tracker.[13] The project is open-source under the MIT license and actively maintained, with recent commits and updates reflecting community-driven improvements. Contributions are encouraged via pull requests, with a dedicated contributing section in the repository providing guidance for those interested in submitting code or documentation changes. No additional official forums, Discord servers, or dedicated community platforms are referenced in the project's documentation or repository. Users seeking support or discussion typically use the GitHub issues system.[1]