evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
450 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Intelligent orchestration for dispatching tasks to specialized background agents with performance-based routing and execution tracking.
Build professional, accessible, and responsive user interfaces using React, Next.js, and modern design systems like shadcn/ui. Focuses on developer tools, chat interfaces, and real-time streaming components.
Expert code review agent that performs systematic audits of git changes for SOLID violations, security vulnerabilities, performance regressions, and architectural smells.
A constitution-driven, spec-first development workflow for Claude Code and Codex, automating feature planning, implementation, and quality assurance through structured agentic loops.
A framework to transform experimental ML prototypes into robust, production-ready Python packages using src layout, hybrid architecture, and strict configuration management.
Write high-quality user stories and requirement documents following the INVEST criteria.
Expert tool for auditing and validating the structural integrity, naming conventions, and best practices of Claude Code configurations, including skills, hooks, and commands.
Structured task planning framework for AI agents to break down complex features, refactors, and bugs into actionable, verifiable steps.
Guide for implementing features using architecture-first design, TDD, rich domain models, and Swift 6.2 patterns, ensuring a clean separation between Domain, Infrastructure, and App layers.
Execute the implementation planning workflow, generate technical design artifacts, and structure research tasks for Spec Kit projects.
Intelligent tool selector for code search. Routes queries between semantic (claudemem) and native tools (Grep/Glob) to optimize efficiency, token usage, and search accuracy.