image-generation
Generate high-quality visual content, characters, and scenes using structured JSON prompts and automated Python execution for guided image synthesis.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
136 skills found
Generate high-quality visual content, characters, and scenes using structured JSON prompts and automated Python execution for guided image synthesis.
Process and manipulate images using ImageMagick. Supports resizing, format conversion, batch processing, and retrieving image metadata for developers and automated workflows.
Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.
Implement Google Gemini API vision capabilities for image/document analysis including captioning, object detection, segmentation, and multi-image comparison.
Robot perception system design, configuration, and optimization for cameras, LiDAR, and sensor fusion pipelines. Includes camera calibration, 3D reconstruction, and production deployment best practices.
High-performance document intelligence library for extracting text, tables, code, and metadata from 91+ file formats, with OCR and LLM-ready output.
An end-to-end video processing pipeline that transforms raw recordings into transcripts, key insights, short clips, and polished articles.
An autonomous UI implementation agent that converts Figma designs into pixel-perfect code using Figma MCP and browser-based refinement.
A unified document processing gateway for PDF parsing, text extraction, conversion, and document manipulation across multiple local and cloud providers.
Generate or edit images using AI models like FLUX and Gemini. Ideal for photos, illustrations, concept art, and visual assets, excluding technical diagrams and schematics.
Generate professional visual assets including app icons, logos, banners, and illustrations using the Nano Banana Pro (Gemini 3 Pro) AI model.
ElevenLabs text-to-speech engine for OpenClaw with macOS-style CLI and voice synthesis control.