ocr
Extract text from images using the Tesseract OCR engine, supporting multiple languages, image preprocessing, and various formats.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
86 skills found
Extract text from images using the Tesseract OCR engine, supporting multiple languages, image preprocessing, and various formats.
Generate professional PowerPoint presentations using AI. Create full-bleed, high-resolution slide decks from topic prompts with Gemini-powered narrative planning and image generation.
Generate high-quality images via a local ComfyUI instance. Perfect for private workflows and professional-grade AI image synthesis.
Generate and edit images using Google's Nano Banana 2 via WaveSpeed AI. Supports text-to-image, natural language editing, multi-image composition, 4K resolution, and various aspect ratios.
Design professional-grade brand identities using geometric primitives, negative space, and flat vector-style aesthetics via AI-driven branding logic.
Generate artistic 3D city-themed food diorama images using Google Gemini API. Creates Pop Mart style four-quadrant layouts featuring iconic dishes, cultural symbols, and city-specific heritage elements.
Generate and edit images using the Gemini API via the nanaban CLI. Create illustrations, logos, and icons, or perform photo edits like background removal and style transfer.
An automated visual note and flowchart generator. Converts text or keywords into styled diagrams, mind maps, and handwritten notes exported as images without requiring file-reading permissions.
Automated screenshot-to-knowledge workflow for Enzo. Captures, categorizes, extracts content, and logs patterns from screenshots to build a structured reference library.
A generative agent skill for creating ASCII art, optimized for rapid, single-pass artistic output without iterative refinement.
Generate realistic virtual product try-on visualizations to help customers evaluate fit, drape, and scale before purchasing.
Fetch, download, and batch process web images in various formats (JPG, PNG, WebP, SVG, etc.) for embedding, archiving, or chat integration.