Content
audio avatar

audio

Generate high-quality audio with support for ElevenLabs, OpenAI, and Google Text-to-Speech. Features voice cloning, multilingual capabilities, and flexible CLI controls.

Introduction

The Audio Generation skill offers a unified, API-driven interface for converting text into natural-sounding speech. Designed for AI coding agents and developers, this tool abstracts the complexity of multiple text-to-speech (TTS) providers into a single, cohesive CLI. It is ideal for developers building interactive applications, automated narration systems, or accessibility tools that require high-fidelity synthetic audio output.

  • Multi-provider support: Seamlessly switch between ElevenLabs for advanced voice cloning and natural synthesis, OpenAI for high-performance TTS-1 and HD models, and Google Text-to-Speech for extensive international language coverage.

  • Native CLI implementation: Built with a clean TypeScript architecture using native fetch, avoiding heavy external HTTP library dependencies.

  • Flexible voice management: List available voice options for each provider, ensuring users can select the perfect tone for their specific use case.

  • Multilingual capability: Leverages the latest models from major providers to support a wide range of global languages and localized accents.

  • High-quality output: Configurable settings for various audio formats and models (e.g., eleven_multilingual_v2, tts-1-hd).

  • To get started, configure your environment variables with valid API keys for ELEVENLABS_API_KEY, OPENAI_API_KEY, and GOOGLE_API_KEY.

  • Usage involves simple commands such as generating audio by specifying the --provider, --text, and --voice, or listing voices via the voices command.

  • The tool is designed for Bun 1.0+ runtime, ensuring fast execution and efficient performance in CI/CD pipelines or local development environments.

  • Constraints: Requires active API subscriptions for the respective providers; Ensure that system dependencies for audio playback or storage are handled if processing large batches of files.

  • Practical tips: Use the --output flag to define specific file paths and naming conventions for your generated assets; chain this skill with other agentic workflows to automate content narration.

Repository Stats

Stars
0
Forks
0
Open Issues
0
Language
TypeScript
Default Branch
main
Sync Status
Idle
Last Synced
May 4, 2026, 12:09 AM
View on GitHub