A text-grid web renderer for AI agents โ see the web without screenshots.
Instead of taking expensive screenshots and piping them through vision models, TextWeb renders web pages as structured text grids that LLMs can reason about natively. Full JavaScript execution, spatial layout preserved, interactive elements annotated.
๐ Documentation ยท ๐ฆ npm ยท ๐ GitHub
Why?
| Approach | Size | Requires | Speed | Spatial Layout |
|---|---|---|---|---|
| Screenshot + Vision | ~1MB | Vision model ($$$) | Slow | Pixel-level |
| Accessibility Tree | ~5KB | Nothing | Fast | โ Lost |
| Raw HTML | ~100KB+ | Nothing | Fast | โ Lost |
| TextWeb | ~2-5KB | Nothing | Fast | โ Preserved |
Quick Start
npm install -g textweb npx playwright install chromium
# Render any page textweb https://news.ycombinator.com # Explicitly request grid mode (same as default) textweb --output grid https://news.ycombinator.com # Semantic JSON output for agent workflows textweb --output semantic https://example.com # Hybrid output (grid + semantic metadata) textweb --output hybrid https://example.com # Interactive mode textweb --interactive https://github.com # Legacy JSON output (backward compatible) textweb --json https://example.com
Example Output
[0]Hacker News [1]new | [2]past | [3]comments | [4]ask | [5]show | [6]jobs | [7]submit [8]login
1. [9]Show HN: TextWeb โ text-grid browser for AI agents (github.com)
142 points by chrisrobison 3 hours ago | [10]89 comments
2. [11]Why LLMs don't need screenshots to browse the web
87 points by somebody 5 hours ago | [12]34 comments
[13:______________________] [14 Search]
~500 bytes. An LLM can read this, understand the layout, and say "click ref 9" to open the first link. No vision model needed.
Integration Options
TextWeb works with any AI agent framework. Pick your integration:
๐ MCP Server (Claude Desktop, Cursor, Windsurf, Cline, OpenClaw)
The fastest way to add web browsing to any MCP-compatible client.
# Install globally npm install -g textweb # Or run directly npx textweb-mcp
Claude Desktop โ add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"textweb": {
"command": "textweb-mcp"
}
}
}Cursor โ add to .cursor/mcp.json:
{
"mcpServers": {
"textweb": {
"command": "textweb-mcp"
}
}
}OpenClaw โ add to openclaw.json skills or MCP config.
Then just ask: "Go to hacker news and find posts about AI" โ the agent uses text grids instead of screenshots.
New (v0.2.1-style MCP capabilities):
session_idon every tool call for isolated parallel workflowstextweb_storage_save/textweb_storage_loadfor persistent auth/session statetextweb_wait_forfor multi-step async UI transitionstextweb_assert_fieldfor flow guards before submit
๐ ๏ธ OpenAI / Anthropic Function Calling
Drop-in tool definitions for any function-calling model. See tools/tool_definitions.json.
Pair with the system prompt to teach the model how to read the grid:
import json # Load tool definitions with open("tools/tool_definitions.json") as f: textweb_tools = json.load(f)["tools"] # Load system prompt with open("tools/system_prompt.md") as f: system_prompt = f.read() # Use with OpenAI response = openai.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": "Go to example.com and click the first link"}, ], tools=textweb_tools, )
๐ฆ LangChain
from tools.langchain import get_textweb_tools # Start the server first: textweb --serve 3000 tools = get_textweb_tools(base_url="http://localhost:3000") # Use with any LangChain agent from langchain.agents import initialize_agent agent = initialize_agent(tools, llm, agent="zero-shot-react-description") agent.run("Find the top story on Hacker News")
๐ข CrewAI
from tools.crewai import TextWebBrowseTool, TextWebClickTool, TextWebTypeTool # Start the server first: textweb --serve 3000 researcher = Agent( role="Web Researcher", tools=[TextWebBrowseTool(), TextWebClickTool(), TextWebTypeTool()], llm=llm, )
๐ HTTP API
# Start the server textweb --serve 3000 # Navigate curl -X POST http://localhost:3000/navigate \ -H 'Content-Type: application/json' \ -d '{"url": "https://example.com"}' # Click, type, scroll curl -X POST http://localhost:3000/click -d '{"ref": 3}' curl -X POST http://localhost:3000/type -d '{"ref": 7, "text": "hello"}' curl -X POST http://localhost:3000/scroll -d '{"direction": "down"}'
๐ฆ Node.js Library
const { AgentBrowser } = require('textweb'); const browser = new AgentBrowser({ cols: 120 }); const { view, elements, semantic, meta } = await browser.navigate('https://example.com'); console.log(view); // The text grid console.log(elements); // { 0: { selector, tag, text, href }, ... } console.log(semantic); // { mode, url, title, elements: [...] } console.log(meta.stats); // { totalElements, interactiveElements, renderMs } await browser.click(3); // Click element [3] await browser.type(7, 'hello'); // Type into element [7] await browser.scroll('down'); // Scroll down await browser.waitFor({ selector: '.step-2.active' }); // Wait for next step await browser.assertField(7, 'hello', { comparator: 'equals' }); // Validate field state await browser.saveStorageState('/tmp/textweb-state.json'); await browser.loadStorageState('/tmp/textweb-state.json'); await browser.query('nav a'); // Find elements by CSS selector await browser.screenshot(); // PNG buffer (for debugging) console.log(browser.getCurrentUrl());// Current page URL await browser.close();
Grid Conventions
| Element | Rendering | Interaction |
|---|---|---|
| Links | [ref]link text |
click(ref) |
| Buttons | [ref button text] |
click(ref) |
| Text inputs | [ref:placeholder____] |
type(ref, "text") |
| Checkboxes | [ref:X] / [ref: ] |
click(ref) to toggle |
| Radio buttons | [ref:โ] / [ref:โ] |
click(ref) |
| Dropdowns | [ref:โผ Selected] |
select(ref, "value") |
| File inputs | [ref:๐ Choose file] |
upload(ref, "/path") |
| Headings | โโโ HEADING โโโ |
โ |
| Separators | โโโโโโโโโโโโโโโโ |
โ |
| List items | โข Item text |
โ |
How It Works
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your Agent (any LLM) โ
โ "click 3" / "type 7 hello" / "scroll down" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ TextWeb โ
โ Pixel positions โ character grid โ
โ Interactive elements get [ref] annotations โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Headless Chromium (Playwright) โ
โ Full JS/CSS execution โ
โ getBoundingClientRect() for all elements โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Real browser renders the page (full JS, CSS, dynamic content)
- Extract every visible element's position, size, text, and interactivity
- Map pixel coordinates to character grid positions (spatial layout preserved)
- Annotate interactive elements with
[ref]numbers for agent interaction
Selector Strategy
TextWeb builds stable CSS selectors for each interactive element, preferring resilient strategies over brittle positional ones:
| Priority | Strategy | Example |
|---|---|---|
| 1 | #id |
#email |
| 2 | [data-testid] |
[data-testid="submit-btn"] |
| 3 | [aria-label] |
input[aria-label="Search"] |
| 4 | [role] (if unique) |
[role="navigation"] |
| 5 | [name] |
input[name="email"] |
| 6 | a[href] (if unique) |
a[href="/about"] |
| 7 | nth-child (fallback) |
div > a:nth-child(3) |
This means selectors survive DOM changes between snapshots โ critical for multi-step agent workflows.
ATS Workflow Examples (Greenhouse / Lever)
For multi-step ATS flows, use a stable session_id and combine wait/assert guards:
// Keep one session for the whole application await textweb_navigate({ url: 'https://job-boards.greenhouse.io/acme/jobs/123', session_id: 'apply-acme' }); // Fill + continue await textweb_type({ ref: 12, text: 'Christopher', session_id: 'apply-acme' }); await textweb_type({ ref: 15, text: 'Robison', session_id: 'apply-acme' }); await textweb_click({ ref: 42, session_id: 'apply-acme', retries: 3, retry_delay_ms: 400 }); // Guard transition await textweb_wait_for({ selector: '#step-2.active', timeout_ms: 8000, session_id: 'apply-acme', retries: 2 }); // Validate before submit await textweb_assert_field({ ref: 77, expected: 'San Francisco', comparator: 'includes', session_id: 'apply-acme' }); // Persist auth/session for follow-up flow await textweb_storage_save({ path: '/tmp/ats-state.json', session_id: 'apply-acme' });
Useful session tools:
textweb_session_listโ inspect active sessionstextweb_session_closeโ close one session or all
App Runtime Prototype (Manifest + LARC)
This repository now includes an early scaffold for a manifest-driven user runtime shell (separate from low-level raw admin tooling):
- Manifest validator:
src/app-runtime/manifest.js - PAN topic contract:
src/app-runtime/topics.js - Runtime shell + left nav + tabbed content:
canvas/app-runtime/app-shell.html - Sample manifest:
canvas/app-runtime/sample-app.json
The runtime shell uses LARC PAN (@larcjs/core-lite) for in-page event communication, with a local fallback bus if the module cannot be loaded.
To open the prototype:
# Start API server for integration hooks (save manifest/components) npm run serve # In another terminal, open the runtime shell open /Users/cdr/Projects/textweb/canvas/app-runtime/app-shell.html
Integration actions implemented on the API server:
POST /integrations/sync_saved_formPOST /integrations/save_manifestPOST /integrations/upsert_nav_itemPOST /integrations/runtime_state
Testing
# Run all tests (form + live + ATS e2e) npm test # Form fixture tests npm run test:form # Live site tests โ example.com, HN, Wikipedia npm run test:live # ATS multi-step fixture test npm run test:ats
Test fixtures are in test/fixtures/ โ includes a comprehensive HTML form and an ATS-style multi-step application fixture.
Design Principles
- Text is native to LLMs โ no vision model middleman
- Spatial layout matters โ flat element lists lose the "where"
- Cheap and fast โ 2-5KB per render vs 1MB+ screenshots
- Full web support โ real Chromium runs the JS
- Interactive โ reference numbers map to real DOM elements
License
MIT ยฉ Christopher Robison