Content
video-watcher avatar

video-watcher

Fetch and parse transcripts from YouTube and Bilibili videos for summarization, QA, and content extraction using yt-dlp.

Introduction

The video-watcher skill is an essential tool for AI agents and power users who need to process audiovisual content efficiently. By leveraging the industry-standard yt-dlp library, it enables the extraction of closed captions (CC) and auto-generated transcripts from both YouTube and Bilibili, the two largest video platforms for long-form educational, technical, and creative content. This skill transforms raw video links into clean, readable text, which is a critical preprocessing step for downstream tasks like summarization, semantic search, and information retrieval.

  • Automatically detects video platforms including youtube.com, youtu.be, bilibili.com, and b23.tv without manual configuration.

  • Supports multi-language subtitle extraction, allowing users to specify languages like English (en), Simplified Chinese (zh-CN), Traditional Chinese (zh-TW), Japanese (ja), or Korean (ko) via CLI arguments.

  • Facilitates rapid content processing for large-scale video libraries, enabling agents to answer specific questions about video content without needing to watch the entire duration.

  • Integrates seamlessly into existing automation pipelines or agent workflows using straightforward bash-based execution commands.

  • Provides raw text output that is perfectly formatted for further analysis by LLMs or text-processing tools.

  • The script requires the yt-dlp utility to be installed and present in the system PATH to function correctly.

  • It only works for videos that provide closed captions or auto-generated subtitles; if a video lacks text tracks, the utility will return an error status.

  • Default language settings are optimized for platform-specific norms, defaulting to English for YouTube and Simplified Chinese for Bilibili.

  • Users should be aware that the quality of transcripts depends on the source video's metadata and the accuracy of auto-generated captioning services.

  • Suitable for researchers, content creators, and developers who need to digest technical tutorials, lecture series, or documentary content at scale.

Repository Stats

Stars
4,437
Forks
1,203
Open Issues
7
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
Apr 29, 2026, 01:14 PM
View on GitHub