Productivity
baoyu-url-to-markdown avatar

baoyu-url-to-markdown

Fetch any URL and convert to high-quality markdown. Supports site-specific adapters for X, YouTube, and Hacker News. Handles login, CAPTCHA, and media downloads via Chrome CDP.

Introduction

The baoyu-url-to-markdown skill is a robust web scraping and conversion tool designed for power users and researchers who need reliable content extraction. By leveraging the Chrome DevTools Protocol (CDP) via the baoyu-fetch CLI, this skill captures rendered page content, ensuring that dynamic elements are correctly processed. It is particularly effective for archiving articles, transcripts, or thread discussions where standard curl-based tools fail. The skill includes specialized adapters for platforms like X (Twitter), YouTube, and Hacker News, allowing it to extract structured data or transcripts directly. For generic pages, it utilizes Defuddle and Readability logic to ensure clean, readable markdown output. It is intended for knowledge workers, developers, and content curators who want to integrate web data into their local LLM workflows or knowledge bases such as Obsidian or Logseq.

  • Advanced Chrome CDP rendering to bypass client-side obfuscation and dynamic content loading.

  • Specialized adapter registry for high-fidelity extraction from X, YouTube, and Hacker News.

  • Interaction support for handling login screens and CAPTCHAs via configurable wait-for-interaction modes.

  • Automatic media processing allowing the download of images and videos with automatic markdown link rewriting.

  • Flexible output formats including clean Markdown for readability or JSON for programmatic data pipelines.

  • Seamless integration with local filesystems for organized archival.

  • Designed for environments with Node.js and Bun installed to ensure optimal performance of the headless browser.

  • First-time setup requires defining user preferences for media handling and default output directories via a non-silent, guided setup process.

  • Respects existing system configurations by supporting a hierarchy of extension files (EXTEND.md) to manage persistent settings across projects.

  • Provides a robust CLI interface that can be triggered directly or via agent interaction for automated batch scraping tasks.

  • Constraints include system resource usage associated with headless Chrome instances; ensure sufficient memory for concurrent tasks.

  • Recommended for users who need structured data capture from websites that require authentication or sophisticated layout parsing.

Repository Stats

Stars
16,764
Forks
1,953
Open Issues
1
Language
TypeScript
Default Branch
main
Sync Status
Idle
Last Synced
Apr 29, 2026, 08:53 AM
View on GitHub