reverse-engineering-api
Reverse engineer web APIs by capturing browser traffic (HAR files) and generating production-ready Python API clients for automation and data extraction.
Introduction
This skill acts as an autonomous engineering assistant designed to bridge the gap between web browsing and API development. It allows users to turn manual website interactions—such as logging in, searching, or navigating through paginated content—into robust, reusable Python API clients. By leveraging Playwright MCP for browser control and HAR (HTTP Archive) traffic analysis, the agent captures the underlying network requests that drive modern web applications, filtering out noise like static assets, tracking scripts, and advertisements to focus exclusively on functional API endpoints. It is specifically built for developers, data engineers, and automation specialists who need to integrate with undocumented web services or extract data programmatically without building fragile screen-scraping bots.
-
Browser Automation: Uses Playwright to simulate human-like interaction with stealth capabilities for effective traffic capture.
-
HAR Analysis Pipeline: Employs specialized utility scripts (har_filter, har_analyze, har_validate) to transform messy raw network traffic into structured API documentation.
-
Automated Code Generation: Produces clean, type-hinted, and production-ready Python code with error handling and session management.
-
Pattern Recognition: Automatically identifies authentication schemes, pagination logic, and request schema structures from captured headers and payloads.
-
Validation Loop: Includes a strict validation phase with a mandatory 90% coverage threshold to ensure the generated client accurately mirrors the observed behavior.
-
Workflow: The agent follows a linear, trackable process: Browser Capture -> Traffic Filtering -> Endpoint Analysis -> Code Generation -> Validation.
-
Inputs: User-provided tasks, website URLs, and browser interaction steps.
-
Outputs: A structured Python API module (api_client.py), full README documentation, and a record of the captured HAR analysis.
-
Operational Constraints: Requires Playwright MCP integration; performance may be affected by advanced bot-detection or dynamic anti-scraping measures on target websites.
-
Best Practices: Always verify the generated code against the provided validation tools to ensure all endpoints and headers (e.g., CSRF tokens, custom cookies) are correctly implemented.
Repository Stats
- Stars
- 664
- Forks
- 60
- Open Issues
- 1
- Language
- Python
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 3, 2026, 09:15 AM