gemini-audio
Implement Google Gemini API audio capabilities: process, transcribe, and summarize audio files, analyze environmental sounds, and generate natural speech with controllable TTS.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
103 skills found
Implement Google Gemini API audio capabilities: process, transcribe, and summarize audio files, analyze environmental sounds, and generate natural speech with controllable TTS.
Transcribe audio files directly into text using the OpenAI Whisper API within OpenClaw.
Manage automatic model routing for Higress AI Gateway via CLI. Configure triggers for intelligent model selection based on request content.
Production-ready audio/video transcription using OpenAI Whisper. Features model selection, timing synchronization, speaker diarization, and batch processing for media workflows.
Extract text from images using the Tesseract OCR engine, supporting multiple languages, image preprocessing, and various formats.
Analyze meeting transcripts to uncover behavioral patterns, communication insights, and leadership feedback. Identify conflict avoidance, filler words, speaking ratios, and active listening to improve your professional presence.
ElevenLabs text-to-speech engine for OpenClaw with macOS-style CLI and voice synthesis control.
Extract and document authentic writing voice from samples. Create comprehensive voice guides for AI training, ghostwriting, and brand consistency.
Download Instagram Reels, extract metadata, and generate full audio transcripts using Groq Whisper. Supports TikTok and YouTube Shorts via yt-dlp.
Analyze YouTube videos with automated transcript extraction, AI-powered summarization, Korean translation, and interactive multi-level comprehension quizzes.
Connect your AI agent to the Hugging Face Hub via MCP. Search models, datasets, and papers, manage repos, run cloud compute jobs, and invoke Gradio Spaces as functional AI tools.
Generate real-time AI podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model with WebSocket streaming, complete with PCM to WAV conversion and frontend playback integration.