Productivity
openai-whisper avatar

openai-whisper

Local speech-to-text transcription using the OpenAI Whisper CLI, providing private, high-accuracy audio processing without external API keys.

Introduction

The openai-whisper skill enables OpenClaw users to perform high-quality, local speech-to-text transcription. By leveraging the power of OpenAI's Whisper model directly on your local machine, this skill ensures that your sensitive audio files, meeting recordings, or personal voice notes never leave your device for processing. It eliminates the dependency on cloud-based STT (Speech-to-Text) services, bypassing the need for API keys and subscription costs while maintaining full control over your data privacy.

  • Local-first transcription: Performs all audio processing offline using local hardware resources.

  • Model selection flexibility: Supports multiple Whisper model sizes (e.g., turbo, medium) to balance processing speed against transcription accuracy based on your hardware capabilities.

  • Format versatility: Outputs transcriptions in standard formats like TXT and SRT for easy integration with subtitles or documentation workflows.

  • Translation capabilities: Supports multi-language translation tasks, allowing for efficient cross-lingual documentation.

  • No API key requirements: Operates entirely independently of cloud infrastructure, perfect for users with strict data security requirements.

  • To use this skill, ensure the Whisper CLI is configured correctly in your environment. The first execution will download the necessary model weights to your local cache directory, typically located at ~/.cache/whisper.

  • For faster performance on consumer hardware, utilize the default 'turbo' model or smaller variants; for higher accuracy in complex audio environments, opt for larger models.

  • Input your audio files via common file formats like MP3 or M4A.

  • Use the task parameter to specify transcription or translation workflows; for example, setting the task to 'translate' will automatically convert input audio into the target language transcription.

  • Ideal for transcribing voice memos, podcasts, interviews, and meetings in a secure, local-only environment without internet connectivity requirements.

Repository Stats

Stars
366,037
Forks
75,046
Open Issues
6,971
Language
TypeScript
Default Branch
main
Sync Status
Idle
Last Synced
Apr 29, 2026, 06:58 AM
View on GitHub