ai-multimodal
Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
121 skills found
Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.
Automate clinical report generation including CARE-compliant case reports, diagnostic summaries, clinical trial documentation (CSR/SAE), and patient notes with regulatory compliance.
Comprehensive Python healthcare AI toolkit for clinical data processing, medical coding translation, and developing deep learning models like RETAIN and Transformers for EHR, physiological signals, and clinical prediction tasks.
Generate high-quality visual content, characters, and scenes using structured JSON prompts and automated Python execution for guided image synthesis.
Analyze AppWorld task failures to extract specific API patterns and generate actionable playbook bullets with concrete code examples.
Generate or edit images using AI models like FLUX and Gemini. Ideal for photos, illustrations, concept art, and visual assets, excluding technical diagrams and schematics.
Access AI-ready datasets, benchmarks, and molecular oracles for drug discovery, including ADME, toxicity, DTI, and molecular generation tasks.
Automated retrieval of PubMed scientific literature and generation of plain-language biomedical research summaries.
Generate and edit images, diagrams, and infographics using Google's Gemini 3 Pro model. Supports text-to-image, style transformation, and data-accurate visual creation.
Find, review, and remove duplicate or near-duplicate images in FiftyOne datasets using computer vision similarity embeddings.
Fast-reference guide and utility skill for Helm chart development, template syntax, and Kubernetes application deployment.
Implement Google Gemini API vision capabilities for image/document analysis including captioning, object detection, segmentation, and multi-image comparison.