evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
194 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
A project-specific template skill for maintaining architectural consistency, coding standards, and deployment workflows in AI-powered full-stack applications.
Build AI agents with the OpenAI Agents SDK for Python. Supports multi-agent handoffs, function tools, stateful sessions, streaming, and Azure OpenAI integration via LiteLLM.
Create and manage production-ready Grafana dashboards for observability, real-time metrics visualization, and system monitoring.
Safely execute, test, and verify commands discovered in documentation with real output capture, performance tracking, and git-aware safety protocols.
Bayesian modeling and probabilistic programming with PyMC. Build hierarchical models, perform MCMC sampling (NUTS), variational inference, and conduct rigorous model comparison using LOO and WAIC.
Accelerate clinical and healthcare app development in Lovable. Perfect for OpenClaw Clinical Hackathon participants building MVPs with PHI-safe patterns.
Expert guidance for designing and implementing high-quality tool schemas and descriptions for Julia's agent systems, ensuring reliable tool execution and reducing model hallucinations.
Apply Holistic Testing with PACT (Proactive, Autonomous, Collaborative, Targeted) principles to build quality into team culture and test strategies for modern software systems.
Orchestrates complex multi-agent software development using a structured Royal Navy squadron metaphor, featuring mission planning, parallel task coordination, and rigorous audit logs.
Implement adaptive learning with ReasoningBank for pattern recognition, strategy optimization, and continuous improvement in AI agents.
A command-line tool for managing, building, and deploying Agent Skills as OCI artifacts within the Agent Skills ecosystem.