openai-whisper-api
Transcribe audio files directly into text using the OpenAI Whisper API within OpenClaw.
Introduction
This OpenClaw skill provides a streamlined interface for audio-to-text transcription by leveraging the industry-standard OpenAI Whisper model via the /v1/audio/transcriptions endpoint. Designed for personal AI assistants, it allows users to process various audio formats into readable transcripts directly from their local terminal environment. It is an ideal tool for users who need to document meetings, summarize voice notes, or process media content within a private, self-hosted AI workflow. By supporting configurable base URLs, it maintains flexibility for users who prefer routing traffic through local gateways, OpenAI-compatible proxies, or custom API endpoints.
-
Full integration with OpenAI Whisper-1 model for high-accuracy speech-to-text conversion.
-
Support for multiple audio formats including .m4a, .ogg, and common voice recording extensions.
-
Configurable output options, allowing users to generate simple text files or structured JSON output for further programmatic processing.
-
Native support for language identification and forced language settings to improve transcription accuracy for non-English content.
-
Customizable prompts for contextual awareness, enabling users to specify speaker names or technical vocabulary to refine the model's output.
-
Centralized authentication management via environment variables or the standard OpenClaw configuration file.
-
To get started, execute the transcription script using your local audio file path: {baseDir}/scripts/transcribe.sh /path/to/file.
-
Ensure the OPENAI_API_KEY is correctly set in your ~/.openclaw/openclaw.json configuration or as a system environment variable.
-
For specialized use cases, utilize the --json flag to receive metadata alongside the transcript, which is helpful for automated data pipelines.
-
Users running local proxies or gateway services for privacy can redirect requests by configuring the OPENAI_BASE_URL parameter.
-
The tool is designed for single-user, local-first operation, ensuring that your audio transcription workflows remain efficient and tightly integrated with your OpenClaw personal assistant setup.
Repository Stats
- Stars
- 366,006
- Forks
- 75,041
- Open Issues
- 6,962
- Language
- TypeScript
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- Apr 29, 2026, 06:02 AM