Automation
video-subtitle-cutter avatar

video-subtitle-cutter

Automate video editing by using Whisper to transcribe and AI to identify filler words, pauses, and mistakes, then generating optimized FFmpeg commands for clean, professional cuts.

Introduction

This skill acts as a specialized video editing agent designed to streamline the post-production process by automating the removal of non-essential audio and video segments. It is ideal for content creators, podcasters, and educators who need to transform raw, unpolished footage into concise, professional-grade media without manual frame-by-frame scrubbing. By leveraging OpenAI's Whisper or local transcription models, the agent converts audio into precise, timestamped JSON transcripts. It then utilizes AI to evaluate these transcripts for filler words such as 'um' or 'uh', repetitive phrases, awkward pauses, and false starts. Once the target segments are identified, the agent generates optimized FFmpeg command sequences to excise the unwanted parts while maintaining perfect A/V synchronization.

  • Automated transcription of video files using Whisper for accurate timestamp extraction.

  • AI-driven content analysis to flag filler words, repetitions, tangents, and long silences.

  • Intelligent segment planning that merges adjacent keep-segments and respects natural word boundaries.

  • Generation of robust FFmpeg processing scripts that strictly avoid -c copy, ensuring smooth cuts by forcing re-encoding.

  • Built-in quality management through adjustable CRF settings, balancing file size and visual fidelity.

  • Support for secondary subtitle generation (SRT) and optional hard-coded subtitle burning for final delivery.

  • The primary workflow involves transcribing, analyzing with a prompt-based AI check, and executing Python-based FFmpeg scripts.

  • Users should ensure local dependencies including FFmpeg and Python are correctly configured on macOS or Linux.

  • The tool specifically enforces re-encoding with libx264 to prevent frame freezing and playback glitches common with stream copying.

  • Users can tune output quality using CRF parameters ranging from 15 (near-lossless) to 28 (high compression).

  • Ensure large video files are pre-processed by extracting audio to MP3 for faster transcription turnaround times.

  • The agent handles the complexities of concat protocols, managing temp files for segment concatenation automatically.

Repository Stats

Stars
221
Forks
43
Open Issues
5
Language
TypeScript
Default Branch
main
Sync Status
Idle
Last Synced
Apr 30, 2026, 11:21 AM
View on GitHub