ai-multimodal
Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
113 skills found
Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.
Generate high-quality visual content, characters, and scenes using structured JSON prompts and automated Python execution for guided image synthesis.
Enhance image quality, resolution, and sharpness for screenshots and digital media. Perfect for professional documentation, blogs, and presentations.
Create aesthetically beautiful interfaces using systematic design principles, AI-driven evaluation, and automated inspiration analysis.
Implement Google Gemini API vision capabilities for image/document analysis including captioning, object detection, segmentation, and multi-image comparison.
Find, review, and remove duplicate or near-duplicate images in FiftyOne datasets using computer vision similarity embeddings.
Generate professional PowerPoint presentations using AI. Create full-bleed, high-resolution slide decks from topic prompts with Gemini-powered narrative planning and image generation.
Gemini-powered UI design review, accessibility auditing, and design system validation tool for software agents.
Creates professional, editable PowerPoint (.pptx) presentations with AI-generated full-slide images, brand consistency, and style references.
Automated screenshot-to-knowledge workflow for Enzo. Captures, categorizes, extracts content, and logs patterns from screenshots to build a structured reference library.
AI-powered food calorie and nutrient calculator. Uses vision recognition to identify meals, calculate macronutrients, and provide health suggestions based on a built-in nutrition database.
Generate high-quality images via a local ComfyUI instance. Perfect for private workflows and professional-grade AI image synthesis.