browsing
Direct Chrome browser control via DevTools Protocol for automated sessions, multi-tab management, form interaction, and structured content extraction.
Introduction
The Browsing skill provides a unified, highly efficient interface for controlling Chrome directly via the Chrome DevTools Protocol (CDP). Designed for agents and developers, this skill replaces cumbersome manual browser testing with an automated, programmable workflow. It is built around the use_browser MCP tool, which enables precise control over browser sessions, including navigation, element interaction, tab management, and DOM-level data extraction. By utilizing the underlying CDP, it offers a more robust and responsive alternative to standard high-level browser automation frameworks for specific use cases like persistent authenticated sessions or performance-constrained environments. Every interaction—from clicking a button to executing custom JavaScript—automatically triggers an auto-capture process that generates structured markdown, HTML snapshots, and viewport screenshots, providing complete auditability for the agent’s actions.
-
Full CDP support for granular browser control: navigate, click, type, hover, drag-and-drop, and execute arbitrary JavaScript code.
-
Intelligent auto-capture system: automatically records page HTML, structured markdown, console logs, and screenshots for every DOM action to ensure traceability.
-
Flexible tab management: support for opening, closing, and toggling focus across multiple browser tabs within a single session.
-
Robust form automation: handles file uploads, complex select inputs, and keyboard event simulation (e.g., Tab, Enter, special keys) natively.
-
DOM inspection and extraction: extract specific attributes, full text content, or render-ready markdown, facilitating efficient data scraping and web analysis.
-
Visual mode toggling: dynamic switching between headed and headless execution modes for debugging or background automation tasks.
-
Ensure the environment supports the required display headers if using headed mode on Linux or WSL2 (via DISPLAY environment variable).
-
Leverage CSS selectors for precise interaction with web elements; the underlying system handles element lookup and coordinate-based mouse events.
-
Use the auto-captured files in the session directory as a primary source of truth for decision-making before executing subsequent navigation or extraction steps.
-
Note that some operations (like show_browser or hide_browser) will restart the Chrome instance, which may clear POST state data.
-
This tool is ideal for scenarios where Playwright is too heavy or when existing, persistent browser sessions (e.g., authenticated logins) need to be maintained across multiple agent steps.
Repository Stats
- Stars
- 266
- Forks
- 39
- Open Issues
- 5
- Language
- JavaScript
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 3, 2026, 05:11 AM