evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
350 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Validates Claude Code plugins against architectural standards, checking manifest files, frontmatter, and tool invocation patterns to ensure high-quality, compliant plugin development.
Architect production-grade LLM applications using LangChain 1.x and LangGraph. Implement stateful AI agents, multi-step workflows, and custom memory systems for complex conversational and automation tasks.
Test-driven development (TDD) workflow for Spring Boot applications using JUnit 5, Mockito, MockMvc, and Testcontainers.
Optimize Apache Spark jobs with partitioning strategies, memory management, shuffle tuning, and data skew mitigation for high-performance data processing pipelines.
A comprehensive Next.js 15 development and project management skill for Claude Code, featuring Supabase integration, RBAC, and automated quality validation.
Standardized workflow and checklist assistant for MassGen release documentation, covering changelogs, Sphinx docs, case studies, and roadmap synchronization.
Expert guide for kagent: the Kubernetes-native framework for building, deploying, and managing AI agents, MCP tools, and A2A protocols.
Creates detailed, step-by-step TDD implementation plans for software development tasks.
C programming language expert for memory management, systems programming, low-level optimization, and debugging best practices.
Agent assignment matrix, blocker escalation, and TDM coordination patterns for multi-agent software workflows.
Architects enterprise AI agents from structured specs, generating production-ready code, data flow diagrams, and platform-specific logic for ServiceNow, Salesforce, and Snowflake.