Automation
remote-browser avatar

remote-browser

Controls a local or remote headless browser for automated web navigation, data extraction, form interaction, and testing from sandboxed environments.

Introduction

The remote-browser skill is a robust automation toolkit designed for agents operating in restricted or headless environments, such as cloud virtual machines, continuous integration (CI) pipelines, and coding agents. It bridges the gap between sandboxed compute resources and the open web, allowing agents to perform complex browser-based tasks reliably without needing a direct GUI display. By leveraging the Chrome DevTools Protocol (CDP), this skill enables fine-grained control over browser sessions, tabs, and page elements, making it ideal for developers and AI agents tasked with web scraping, automated testing, or site interaction.

  • Full browser lifecycle management: Initiate, navigate, refresh, and terminate headless Chromium sessions directly via terminal commands or Python API.
  • Element-aware interaction: Automatically parses page states to expose clickable indices, allowing agents to perform precise clicks, inputs, hovers, and double-clicks.
  • Comprehensive data extraction: Retrieve page titles, full HTML source, text content, and element attributes through simple diagnostic commands.
  • Network and session support: Manage persistent cookies, handle multiple browser tabs, and tunnel local development servers to the cloud via Cloudflare for easy testing.
  • Flexible execution: Supports chaining commands for efficient script-like operations and provides an integrated Python environment for executing complex logic within the browser context.

When using this skill, agents act as headless browsers that can handle modern web complexities. It is highly recommended to run the doctor command to verify environmental compatibility before deployment. Users can connect to existing Chrome instances via CDP URLs or use the built-in cloud connectivity features for seamless scaling. The skill supports advanced input scenarios including file uploads, keyboard sequence emulation, and coordinate-based clicking. Please note that while the tool is designed to be persistent, agents should ensure clean session management using the close command to prevent resource leakage in cloud environments. It integrates natively with Python-based agents and supports various authentication mechanisms if needed for private web navigation.

Repository Stats

Stars
91,320
Forks
10,399
Open Issues
239
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
Apr 30, 2026, 09:11 AM
View on GitHub