snag
Intelligently fetch web page content using a browser engine.
Why snag?
Built for AI agents to consume web content efficiently.
Modern AI agents need web content in clean, token-efficient formats. snag solves this by:
- Markdown output - AI models work better with markdown than HTML (70% fewer tokens)
- Real browser rendering - Handles JavaScript, dynamic content, lazy loading automatically
- Authentication support - Access private/authenticated pages through persistent browser sessions
- Tab management - List, select, and reuse existing browser tabs without creating new ones
- Content archival - Build reference libraries of web content for future AI agent use
- Simple CLI interface - One command, clean output, no complex automation scripts
Perfect for:
- AI coding assistants fetching documentation
- Building knowledge bases from authenticated sites
- Capturing dynamic web content for analysis
- Piping web content into AI processing pipelines
- Taking page screenshots for CSS/Style analysis
Quick Start
# Install via Homebrew brew tap grantcarthew/tap brew install grantcarthew/tap/snag # Fetch a page as Markdown (default format) snag example.com # Save to file snag docs.example.com > docs.md
That's it! snag auto-detects your Chromium-based browser and handles everything else.
Installation
Prerequisites
snag requires a Chromium-based (Chrome) browser:
Linux:
# Ubuntu/Debian sudo apt update && sudo apt install chromium-browser # Fedora sudo dnf check-update && sudo dnf install chromium # Arch Linux sudo pacman -Sy chromium # Homebrew brew install chromium
macOS:
# Chromium (recommended) via Homebrew # or Chrome - download from https://www.google.com/chrome/ brew install chromium
Supported browsers: Chrome, Chromium, Microsoft Edge, Brave, other Chromium-based browsers
Install snag
Homebrew (Linux/macOS):
Note: There's a name conflict with an older deprecated tool. Use the full tap name:
brew tap grantcarthew/tap brew install grantcarthew/tap/snag
Go Install:
go install github.com/grantcarthew/snag@latest
Build from Source:
git clone https://github.com/grantcarthew/snag.git
cd snag
go build
./snag --versionUsage
Basic Examples
# Fetch page as Markdown (default) snag example.com snag https://example.com # Save to file snag -o output.md https://example.com snag example.com > output.md # Get raw HTML instead snag --format html https://example.com # Get plain text only (strips all HTML) snag --format text https://example.com # Quiet mode (content only, no logs) snag --quiet https://example.com # Wait for dynamic content to load snag --wait-for ".content-loaded" https://dynamic-site.com # Increase timeout for slow sites snag --timeout 60 https://slow-site.com # Verbose logging for debugging snag --verbose https://example.com
Output Formats
snag supports 5 output formats for different use cases. Format names are case-insensitive and support aliases for convenience.
Text Formats
Markdown (default):
Clean, readable text format optimized for AI agents and documentation. Uses 70% fewer tokens than HTML.
# Default format (no flag needed) snag https://example.com # Explicit format snag --format md https://example.com # Alias also works (backward compatibility) snag --format markdown https://example.com # Case-insensitive snag --format MD https://example.com snag --format Markdown https://example.com
HTML:
Raw HTML output, preserving original page structure.
# Get raw HTML snag --format html https://example.com # Case-insensitive snag --format HTML https://example.com
Text:
Plain text only, strips all HTML tags and formatting.
# Extract plain text snag --format text https://example.com # Alias also works snag --format txt https://example.com # Case-insensitive snag --format TEXT https://example.com
Binary Formats (PDF, PNG)
Binary formats automatically generate filenames to prevent terminal corruption. Files are saved to the current directory unless you specify a location.
PDF:
Visual rendering as a PDF document using Chrome's native rendering engine.
# Auto-generates filename in current directory snag --format pdf https://example.com # Creates: 2025-10-22-142033-example-domain.pdf # Specify custom filename snag --format pdf -o report.pdf https://example.com # Save to specific directory with auto-generated name snag --format pdf -d ~/Downloads https://example.com # Creates: ~/Downloads/2025-10-22-142033-example-domain.pdf # Case-insensitive snag --format PDF https://example.com
PNG:
Full-page screenshot as a PNG image.
# Auto-generates filename in current directory snag --format png https://example.com # Creates: 2025-10-22-142033-example-domain.png # Specify custom filename snag --format png -o screenshot.png https://example.com # Save to specific directory with auto-generated name snag --format png -d ~/screenshots https://example.com # Creates: ~/screenshots/2025-10-22-142033-example-domain.png # Case-insensitive snag --format PNG https://example.com
Why auto-generate filenames?
Binary formats (PDF, PNG) cannot output to stdout because binary data corrupts terminal display. When you don't specify -o or -d, snag automatically generates a timestamped filename in the current directory.
Auto-generated filename format:
yyyy-mm-dd-hhmmss-{page-title-slug}.{ext}
Example: 2025-10-22-142033-github-snag-repo.png
Common Scenarios
AI Agent Documentation Fetching
# Fetch API documentation for AI context snag https://api.example.com/docs > api-reference.md # Pipe directly to AI assistant snag --quiet https://docs.python.org/3/library/os.html | your-ai-tool
Building a Knowledge Base
# Save multiple pages to a reference directory
snag -o reference/golang-basics.md https://go.dev/doc/tutorial/getting-started
snag -o reference/golang-concurrency.md https://go.dev/doc/effective_go#concurrency
snag -o reference/golang-errors.md https://go.dev/blog/error-handling-and-goFetching Dynamic Content
# Wait for JavaScript to render content snag --wait-for "#main-content" https://single-page-app.com # Give slow sites more time snag --timeout 90 --wait-for ".loaded" https://heavy-site.com
Working with Authenticated Tabs
# Step 1: Open browser and log in to your sites snag --open-browser # (Manually log in to your private sites in the browser window) # Step 2: List tabs to see what's available snag --list-tabs # Example output: # Available tabs in browser (4 tabs, sorted by URL): # [1] about:blank (New Tab) # [2] https://app.example.com/dashboard (Dashboard) # [3] https://github.com/myorg/private-repo (My Private Repo) # [4] https://internal.company.com/docs (Internal Documentation) # Step 3: Fetch from authenticated tabs without re-logging in snag -t 2 -o private-repo.md snag -t "dashboard" -o dashboard.md snag -t "internal" -o internal-docs.md # All fetches reuse the existing authenticated session!
Working with Multiple Open Tabs
# Collect documentation from tabs you already have open snag -t "python" > python-docs.md snag -t "golang" > golang-docs.md snag -t "rust" > rust-docs.md # Use patterns to match specific tabs snag -t "github.com/.*" > github-content.md snag -t ".*/dashboard" > dashboard.md # Fetch by index if you know the tab position for i in 1 2 3 4; do snag -t $i -o "tab-$i.md" done # Process all open tabs at once snag --all-tabs --output-dir ~/my-tabs snag -a -d ~/reference # Combine --all-tabs with format options snag --all-tabs --format pdf -d ~/pdfs snag --all-tabs --format png -d ~/screenshots
Batch Processing URLs
# Process multiple URLs inline snag -d output/ https://example.com https://github.com https://go.dev # Process URLs from a file snag --url-file urls.txt -d output/ # Pipe URLs from stdin cat urls.txt | snag --url-file - -d output/ # Pipe filtered URLs grep "^https://docs" urls.txt | snag --url-file - -d ./documentation/ # Using heredoc snag --url-file - -d pages/ <<EOF # My URLs example.com github.com/grantcarthew/snag go.dev EOF # Process URLs from a file (shell loop alternative) while read url; do filename=$(echo "$url" | sed 's/[^a-zA-Z0-9]/_/g').md snag --quiet -o "$filename" "$url" done < urls.txt # Combine multiple pages for url in https://example.com/page1 https://example.com/page2; do snag --quiet "$url" >> combined.md echo -e "\n---\n" >> combined.md done
CI/CD Integration
# Fetch documentation in CI pipeline snag --force-headless --timeout 30 https://docs.example.com > docs.md # Quiet mode for clean logs snag --quiet --force-headless https://example.com > output.md
Authentication
snag makes it easy to fetch content from authenticated/private sites using persistent browser sessions.
Method 1: Visible Browser Mode
Open a browser, authenticate manually, then snag connects to it:
# Step 1: Open browser in visible mode and log in manually # Note: Using the --open-browser (-b) switch enables the required DevTools protocol snag --open-browser # Step 2: In the browser window, navigate to your site and log in # (Leave the browser open) # Step 3: Fetch authenticated content - snag reuses your session snag https://private.example.com # Step 4: Fetch more pages with the same session snag https://private.example.com/dashboard snag https://private.example.com/settings
Method 2: Force Visible Mode
Let snag launch the browser for you:
# Open browser and navigate to page for authentication snag --open-browser https://private.example.com # Authenticate in the browser window that opens # Then leave it running # Subsequent calls reuse the session snag https://private.example.com/other-page
Method 3: Existing Chromium Session
Keep one browser session for multiple snag calls:
# Terminal 1: Start Chromium with remote debugging chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-profile # Log in to your sites manually in this browser # Terminal 2: Use snag with the existing session snag https://authenticated-site1.com snag https://authenticated-site2.com snag https://authenticated-site3.com
All three commands share authentication state - no repeated logins required!
Method 4: Using Your Default Chrome Profile
You can use your existing Chrome profile with all its saved logins and cookies:
Option A: Daily workflow - Use snag as your Chrome launcher
If you use snag regularly, you can make it your primary way to launch Chrome:
# Close your regular Chrome first, then launch via snag: snag --open-browser --user-data-dir ~/.config/google-chrome # Linux snag --open-browser --user-data-dir ~/.config/chromium # Linux Chromium snag --open-browser --user-data-dir ~/Library/Application\ Support/Google/Chrome # macOS # Now browse normally AND use snag for tab fetching: snag --list-tabs snag -t 1 # Fetch from any tab snag https://example.com # Open new tabs
This gives you your full Chrome experience (bookmarks, extensions, history, passwords) PLUS snag's tab management capabilities!
Option B: One-off fetches with your profile
# Must close Chrome first! snag --user-data-dir ~/.config/google-chrome \ https://private.example.com
Important caveats:
-
Chrome must be closed - You cannot run both Chrome and snag with the same profile simultaneously. Chrome locks profile directories to prevent corruption.
-
Risk of corruption - If something goes wrong, you could corrupt your primary profile data. Consider using a separate profile for automation.
-
Profile structure - Chrome's
--user-data-dirpoints to the parent directory containing multiple profiles (Default, Profile 1, etc.). Chrome will use the Default profile unless you specify otherwise.
Option C: Safer alternative - Use a dedicated profile for snag
# Create and use a dedicated profile for snag snag --user-data-dir ~/.config/google-chrome/snag-profile \ --open-browser # Authenticate once in the browser window # Profile persists between runs - no need to re-authenticate! # Subsequent fetches reuse the same profile snag --user-data-dir ~/.config/google-chrome/snag-profile \ https://private.example.com
The dedicated profile approach gives you persistence without risking your main Chrome profile.
Advanced Usage
Custom User Agent
Bypass headless detection or mimic specific browsers:
# Linux Firefox user agent snag --user-agent "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0" \ https://example.com # Custom bot identifier snag --user-agent "MyBot/1.0 (+https://example.com/bot)" \ https://api-docs.example.com
Debugging Failed Fetches
# See what's happening during fetch snag --verbose https://problematic-site.com # Full debug output including browser messages snag --debug https://problematic-site.com 2> debug.log # Open browser to see what snag sees snag --open-browser https://problematic-site.com
Working with Browser Tabs
snag can list and fetch content from existing browser tabs, making it easy to reuse authenticated sessions and reduce tab clutter.
List all open tabs:
# See what tabs are currently open snag --list-tabs snag -l # Example output: # Available tabs in browser (3 tabs, sorted by URL): # [1] https://app.example.com/dashboard (Dashboard (authenticated)) # [2] https://docs.python.org/3/ (3.13.1 Documentation) # [3] https://github.com/grantcarthew/snag (grantcarthew/snag: Intelligent web content fetcher)
Fetch from specific tab by index:
# Fetch from first tab snag --tab 1 snag -t 1 # Fetch from third tab and save to file snag -t 3 -o docs.md # Get HTML instead of Markdown snag -t 2 --format html # Get as PDF or PNG snag -t 3 --format pdf -o docs.pdf snag -t 1 --format png -o screenshot.png
Fetch from tab by URL pattern:
# Exact URL match (case-insensitive) snag -t "https://docs.python.org/3/" snag -t "GITHUB.COM/grantcarthew/snag" # Contains/substring match (processes ALL matching tabs if multiple) snag -t "dashboard" # Outputs to stdout if 1 match, auto-saves all if multiple snag -t "python" # Fetches all tabs containing "python" snag -t "github" -d ./ # Saves all github tabs to current directory # Regex pattern match (processes ALL matching tabs if multiple) snag -t "https://.*\.com" # All .com URLs snag -t ".*/dashboard" # All dashboard URLs snag -t "(github|gitlab)\.com" # All github or gitlab tabs
Pattern matching behavior:
- Tries in order: exact URL match → contains match → regex match
- Single match: Outputs to stdout (or to file with
-o) - Multiple matches: Auto-saves all tabs with generated filenames (use
-dfor custom directory)
Why use tabs?
- Reuse authenticated sessions without re-logging in
- Fetch from multiple pages without creating new tabs
- Quick access to content you already have open
Tab closing behavior:
# Close tab after fetching (default in headless mode) snag --close-tab https://example.com # Keep tab open (default in visible mode) snag https://example.com
Custom Remote Debugging Port
# Use different port if 9222 is busy snag --port 9223 https://example.com # Connect to Chromium running on custom port chromium --remote-debugging-port=9223 & snag --port 9223 https://example.com
CLI Reference
Core Arguments
<url> URL to fetch (required, unless using --list-tabs or --tab)
-v, --version Display version information
-h, --help Show help message and exit
Tab Operations
-l, --list-tabs List all open tabs in the browser
-t, --tab <PATTERN> Fetch from existing tab by index (1, 2, 3...) or URL pattern
Patterns can be:
- Index number: 1, 2, 3 (tab position)
- Exact URL: https://example.com (case-insensitive)
- Substring: dashboard, github, docs (contains match)
- Regex: https://.*\.com, .*/dashboard, (github|gitlab)\.com
-a, --all-tabs Process all open browser tabs (saves with auto-generated filenames)
Requires --output-dir or saves to current directory
Note: Tabs are sorted alphabetically by URL (primary), then Title (secondary), then ID (tertiary) for predictable ordering. Chrome DevTools Protocol doesn't guarantee visual left-to-right tab order, so snag sorts tabs to ensure consistent, reproducible results. Tab [1] = first tab alphabetically by URL, not the first visual tab in your browser.
Output Control
-o, --output <file> Save output to file instead of stdout
-d, --output-dir <dir> Save files with auto-generated names to directory
-f, --format <FORMAT> Output format: md (default) | html | text | pdf | png
Format aliases: markdown→md, txt→text
Case-insensitive: MD, MARKDOWN, Html, PDF, etc.
Page Loading
--timeout <seconds> Page load timeout in seconds (default: 30)
-w, --wait-for <selector> Wait for CSS selector before extracting content
Browser Control
-p, --port <port> Chromium remote debugging port (default: 9222)
-c, --close-tab Close the browser tab after fetching content
--force-headless Force headless mode even if Chromium is running
-b, --open-browser Open Chromium browser in visible state (no URL required)
-k, --kill-browser Kill browser processes with remote debugging enabled
Logging/Debugging
--verbose Enable verbose logging output
-q, --quiet Suppress all output except errors and content
--debug Enable debug output with CDP messages
Request Control
--user-agent <string> Custom user agent string (bypass headless detection)
Troubleshooting
Browser Issues
"Browser not found" error
snag cannot locate Chrome/Chromium on your system.
Solutions:
- Install Chromium:
brew install chromium - Install Chrome from https://www.google.com/chrome/
- Ensure Chromium/Chrome is in your system PATH
"Failed to connect to existing browser"
Cannot connect to running browser instance.
Solutions:
- Ensure Chromium/Chrome is launched with
--remote-debugging-port=9222 - Try different port:
snag --port 9223 https://example.com - Kill existing Chromium/Chrome processes and let snag launch a new instance
"Stuck or lingering browser processes"
Browser processes with remote debugging enabled remain after snag exits.
Solutions:
- Kill all debugging browsers:
snag --kill-browserorsnag -k - Kill specific port only:
snag --kill-browser --port 9223 - Note: Only kills browsers with
--remote-debugging-portenabled (development browsers), never regular browsing sessions - Safe for scripting: exits with code 0 even if no browsers found (idempotent)
Diagnostic Information
Get comprehensive diagnostic information about your snag environment:
# Run diagnostics snag --doctor # Check specific port snag --doctor --port 9223
This displays:
- snag and Go versions (with update check)
- Detected browser and version
- Browser connection status and tab counts
- Profile locations for all common browsers
- Environment variables
- Working directory
Use this when:
- Troubleshooting issues
- Reporting bugs (include doctor output)
- Checking if browser is running
- Finding profile paths
- Verifying snag installation
Authentication Issues
"Authentication required" error
Page requires login but snag cannot authenticate.
Solutions:
- Open browser with
snag --open-browser, log in, then run snag again - Use
--list-tabsto find authenticated tabs, then--tabto fetch from them - Browser session persists authentication across snag calls
Tab Issues
"No Chrome instance running" when using --list-tabs or --tab
Tab features require an existing browser with remote debugging enabled.
Solutions:
- Open browser first:
snag --open-browser - Or manually start Chrome/Chromium:
chromium --remote-debugging-port=9222 - Then run
snag --list-tabsto verify connection
"Tab index out of range" or "No tab matches pattern"
Cannot find the specified tab.
Solutions:
- Run
snag --list-tabsto see available tabs and their indexes - Tab indexes are 1-based (first tab is 1, not 0)
- Tabs are sorted by URL, not visual browser order - tab [1] is first alphabetically by URL
- For patterns, try simpler matches:
snag -t "example"instead of complex regex - Remember: pattern matching is case-insensitive
Pattern not matching expected tab
Your pattern matches a different tab than expected.
Solutions:
- Use
--list-tabsto see exact URLs of open tabs - Be more specific with your pattern: use full URL instead of substring
- Remember: multiple matching tabs will all be processed and auto-saved (not just first match)
- For single specific tab: use exact URL pattern or tab index:
snag -t 3
Timeout Issues
"Page load timeout" error
Page takes too long to load.
Solutions:
- Increase timeout:
snag --timeout 60 https://example.com - Use
--wait-forfor specific element:snag --wait-for ".content" https://example.com - Check network connectivity
- Try
--verboseto see what's happening
Page loads but content is missing
Dynamic content hasn't appeared yet.
Solutions:
- Use
--wait-forwith selector:snag --wait-for "#main-content" https://example.com - Increase timeout to allow for slow loading
- Inspect page with
--format htmlto see raw output
Output Issues
Output is empty or incomplete
Fetched page but content is missing.
Solutions:
- Try
--format htmlto see raw HTML - Try
--format textto see plain text extraction - Use
--verboseto check if page loaded correctly - Page may require authentication (see authentication section)
- Content may be loaded dynamically (use
--wait-for)
Markdown formatting looks wrong
Converted Markdown has formatting issues.
Solutions:
- Use
--format htmlto get raw HTML instead - Use
--format textfor plain text only (no formatting) - Some complex HTML structures may not convert perfectly to Markdown
- Report specific issues at https://github.com/grantcarthew/snag/issues
Platform-Specific Issues
Linux: "No DISPLAY environment variable"
Running in headless environment without display.
Solutions:
- Headless mode should work automatically
- Ensure Xvfb is installed:
sudo apt install xvfb - Use
--force-headlessexplicitly
macOS: "Chromium.app cannot be opened"
macOS security blocking Chromium/Chrome launch.
Solutions:
- Open Chromium manually first:
open -a Chromiumoropen -a "Google Chrome" - Check System Preferences > Security & Privacy
- Allow the browser in privacy settings
macOS: Browser processes remain after closing window
On macOS, closing a Chrome/Chromium window doesn't quit the application - processes continue running in the background.
This is normal macOS behavior. To fully quit:
- Press Cmd+Q in the browser window
- Right-click Chrome icon in Dock → Quit
- Or:
pkill -f "Chrome.*remote-debugging-port"
Getting Help
Still having issues?
- Run with
--debugflag for detailed logs - Check existing issues: https://github.com/grantcarthew/snag/issues
- Create new issue with:
- snag version:
snag --version - Operating system and version
- Full command you ran
- Complete error message
- Output from
--debugflag
- snag version:
How It Works
Smart Browser Management
- Session Detection: Auto-detects existing Chromium-based browser instance with remote debugging enabled
- Mode Selection:
- If Chromium browser is running → Connect to existing session (preserves auth/cookies)
- If no browser found → Launch headless mode
- Use
--open-browserto open visible browser for authentication
- Tab Management:
- List tabs with
--list-tabsto see what's currently open - Fetch from specific tabs using
--tab(by index or URL pattern) - Tabs stay open in visible mode, close in headless mode (or use
--close-tab) - Reuse authenticated sessions without creating new tabs
- List tabs with
Output Routing
- stdout: Content only (HTML/Markdown) - enables piping to other tools
- stderr: All logs, warnings, errors, progress indicators
This design makes snag perfect for shell pipelines and AI agent integration.
Technology
- Language: Go 1.25.3
- Browser Control: Chrome DevTools Protocol via go-rod/rod
- HTML Conversion: html-to-markdown/v2
- CLI Framework: cobra
Contributing
Contributions welcome! Please:
- Check existing issues: https://github.com/grantcarthew/snag/issues
- Create issue for bugs or feature requests
- Submit pull requests against
mainbranch
Reporting Issues
Include:
- snag version:
snag --version - Operating system and version
- Full command and error message
- Output from
--debugflag
License
snag is licensed under the Mozilla Public License 2.0.
Third-Party Licenses
This project uses the following open-source libraries:
- go-rod/rod - MIT License
- cobra - Apache 2.0 License
- html-to-markdown - MIT License
See the LICENSES directory for full license texts.
Author
Grant Carthew grant@carthew.net