gemini-video-understanding
Perform advanced video analysis using Google's Gemini API: summarize content, transcribe audio, extract timestamps, clip segments, and analyze YouTube URLs or local files with support for multiple models and long contexts.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
139 skills found
Perform advanced video analysis using Google's Gemini API: summarize content, transcribe audio, extract timestamps, clip segments, and analyze YouTube URLs or local files with support for multiple models and long contexts.
Implement Google Gemini API vision capabilities for image/document analysis including captioning, object detection, segmentation, and multi-image comparison.
A specialized decision-making agent for complex architectural choices, task planning, and error resolution within the orchestration system.
Upstash Vector DB setup, semantic search, namespaces, and embedding models. Ideal for building high-performance vector search features in Next.js 16/Vercel projects.
Architect and optimize production-grade RAG systems. Master embedding models, vector databases, chunking strategies, and retrieval pipelines for high-accuracy LLM applications.
A structured personal operating system for managing digital presence, knowledge, relationships, and goals with AI assistance for founders, creators, and professionals.
Generate images using the Cloudflare Workers AI flux-1-schnell model. Enables text-to-image capabilities directly within your workflow.
Fetch and parse transcripts from YouTube and Bilibili videos for summarization, QA, and content extraction using yt-dlp.
High-performance document intelligence library for extracting text, tables, code, and metadata from 91+ file formats, with OCR and LLM-ready output.
Deep document structure analysis and intelligent content extraction for knowledge bases.
Guidelines for curating high-quality datasets for LLM post-training (SFT/DPO/RLHF), covering data formats, quality filtering, and collection strategies.
Aggressively prune grammatical scaffolding and filler text from inputs to optimize LLM token usage while retaining core semantic content.