ai-multimodal
Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
110 skills found
Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.
Generate high-quality images via a local ComfyUI instance. Perfect for private workflows and professional-grade AI image synthesis.
Generate high-quality visual content, characters, and scenes using structured JSON prompts and automated Python execution for guided image synthesis.
Generate and edit images using Google's Nano Banana 2 via WaveSpeed AI. Supports text-to-image, natural language editing, multi-image composition, 4K resolution, and various aspect ratios.
Automate Python scripting and Gemini-powered image generation using uv. Ideal for creating art, editing images, and running ad-hoc scripts.
Unified local ML inference server for ASR, TTS, Translation, Image Generation, and Vision on Apple Silicon, powered by MLX.
Generate and edit images using the Gemini API via the nanaban CLI. Create illustrations, logos, and icons, or perform photo edits like background removal and style transfer.
Generate images using the Cloudflare Workers AI flux-1-schnell model. Enables text-to-image capabilities directly within your workflow.
A generative agent skill for creating ASCII art, optimized for rapid, single-pass artistic output without iterative refinement.
Generate or edit images using AI models like FLUX and Gemini. Ideal for photos, illustrations, concept art, and visual assets, excluding technical diagrams and schematics.
Generate artistic 3D city-themed food diorama images using Google Gemini API. Creates Pop Mart style four-quadrant layouts featuring iconic dishes, cultural symbols, and city-specific heritage elements.
Generate and edit images, diagrams, and infographics using Google's Gemini 3 Pro model. Supports text-to-image, style transformation, and data-accurate visual creation.