evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
391 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Convert clinical text to natural, empathetic speech using ElevenLabs for patient instructions, medication reminders, and accessible health content.
Technical writing specialists for functional and API documentation. Dispatch to create compliant guides, conceptual docs, and API references following the ORCHESTRATOR principle.
Analyzes markdown files to identify token-wasting patterns, providing actionable suggestions to optimize documentation for LLM consumption and token efficiency.
Semantic code analysis guide for Serena MCP. Automatically prioritizes Serena tools for symbols, references, and code memory to optimize context and efficiency.
Physical hardware synthesis bridge for PAI. Generates blueprints, 3D printing code, SVG paths for laser cutting, and G-Code for CNC machining to bring agentic designs into the physical world.
Analyze search results (SERP) to classify user intent, identify feature opportunities, and conduct competitive intelligence for content strategy.
Manage YNAB budgets, track spending, and automate financial reports via API. Features include transaction logging, goal monitoring, and automated budget analysis.
Advanced Gemini-powered web search plugin with smart caching, subagent context isolation, and automated query optimization.
Interactive Archon integration for knowledge base and project management. Features RAG-powered semantic search, website crawling, document versioning, and hierarchical task management via REST API.
Validates and coordinates batch study guide operations, preventing errors by enforcing template compatibility, file availability, and source-only policies before agent execution.
Implement production-grade AI agents with LangGraph, tool-calling guardrails, SSE streaming, and episodic memory. Includes anti-patterns, fix pairs, and stateful architecture patterns.