Automation
agent-browser avatar

agent-browser

Advanced browser automation for research, web interaction, and data extraction within secure container environments.

Introduction

The agent-browser skill provides robust web navigation and interaction capabilities for AI agents, specifically designed for the NanoClaw containerized ecosystem. It allows agents to perform complex tasks by mimicking human web usage, including navigating sites, filling dynamic forms, clicking interactive elements, and extracting structured data. This skill is ideal for researchers, engineers, and power users who need to automate web-based workflows without exposing their host environment to external web content, as the entire browsing session is sandboxed within a Linux container.

  • Full web navigation: open, back, forward, reload, and PDF generation.

  • Intelligent snapshotting: generates accessibility trees and identifies interactive elements via DOM refs (e.g., @e1).

  • Interaction suite: supports clicking, double-clicking, typing, checking boxes, selecting dropdown values, hovering, and uploading files.

  • Information retrieval: extracts text, innerHTML, values, attributes, and page metadata, or counts specific elements.

  • Semantic search: find and interact with elements using human-readable attributes like roles, labels, placeholders, or visible text.

  • Session management: supports cookie operations, local storage manipulation, and saving/loading authentication states for persistence.

  • JavaScript execution: runs custom scripts directly in the browser context for advanced page manipulation.

  • Wait utilities: intelligent waiting for specific elements, network idle states, text appearance, or URL patterns.

  • Best practice: Always perform a snapshot before attempting interactions to acquire current, valid element references.

  • Efficiency: Use the interactive snapshot (-i) to limit the DOM tree scope and reduce token overhead during processing.

  • Persistence: Leverage state saving for authenticated sessions to avoid repetitive login procedures.

  • Isolation: The skill operates through secure Bash calls to the container environment, ensuring no direct access to the host machine's sensitive resources.

  • Monitoring: For long-running tasks, use the screenshot feature to periodically capture the visual state of the browser for agent verification.

Repository Stats

Stars
28,347
Forks
12,710
Open Issues
789
Language
TypeScript
Default Branch
main
Sync Status
Idle
Last Synced
Apr 29, 2026, 12:26 PM
View on GitHub