Content
analyze-video avatar

analyze-video

Adds visual metadata to video transcripts by extracting and analyzing frames with ffmpeg, enabling visual context for LLM-based video editing workflows.

Introduction

The analyze-video skill serves as a critical component in the ButterCut pipeline, bridging the gap between raw video footage and structured editorial data. By leveraging ffmpeg to perform frame extraction at defined intervals, this skill transforms static video content into a machine-readable format. These extracted JPG frames are then analyzed to generate periodic visual descriptions, which are appended to existing audio transcripts. This process results in a comprehensive visual transcript that allows Claude or other AI agents to 'see' the video, enabling informed editorial decisions for rough cuts and sequence creation in professional NLEs like Final Cut Pro, Premiere Pro, and DaVinci Resolve.

  • Automated frame extraction using high-performance ffmpeg processes for efficient ingestion.

  • Seamless integration with audio transcripts generated by tools like WhisperX.

  • Produces structured JSON visual transcripts that correlate visual events with audio segments.

  • Supports parallel execution of up to 8 concurrent tasks to optimize CPU/GPU utilization without system saturation.

  • Enables advanced AI-driven video editing, allowing users to query footage content, search for specific visual actions, and generate sequence timelines based on descriptive metadata.

  • Prerequisites: Ensure all input videos have associated audio transcripts generated by the transcribe-audio skill prior to execution.

  • Configuration: The skill operates within a defined Library structure; update library.yaml with the generated visual_transcript path upon completion.

  • Performance Note: While ffmpeg frame extraction is a rapid CPU burst, the subsequent LLM analysis is API-dependent; limit concurrent tasks to 8 to maintain stability on diverse hardware.

  • Workflow Integration: Once analysis is complete, chain this skill with summarize-video (using the Haiku model) to produce concise video summaries.

  • Ideal for editors, creators, and developers seeking to automate the rough-cut process by transforming raw assets into a searchable, metadata-rich library.

Repository Stats

Stars
427
Forks
68
Open Issues
25
Language
Ruby
Default Branch
main
Sync Status
Idle
Last Synced
May 1, 2026, 07:25 AM
View on GitHub