evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
459 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Rigorous, non-performative code review reception for AI agents, prioritizing technical verification and YAGNI over passive agreement.
Audit Packmind documentation by cross-referencing MDX files against the codebase to detect broken links, outdated CLI references, and missing coverage.
A systematic code auditing framework for identifying technical debt, security vulnerabilities, dead code, and code quality issues in software projects.
Guide for creating and managing Sindri declarative YAML extensions, including capabilities for project-init, auth, lifecycle hooks, and MCP integration.
Controls a local or remote headless browser for automated web navigation, data extraction, form interaction, and testing from sandboxed environments.
Search and reference Chromium documentation, including design docs, APIs, and development guides. Use to locate, browse, or learn about architecture, GPU, network, security, and testing concepts within the Chromium codebase.
Debug failing GitHub Actions CI checks by fetching logs, summarizing failures, and planning fixes.
A standardized workflow for converting raw PM notes, workshops, or rough drafts into polished, validated, and repository-compliant AI skills.
A framework to transform experimental ML prototypes into robust, production-ready Python packages using src layout, hybrid architecture, and strict configuration management.
Create robust, scalable, and maintainable technical implementation plans for complex software projects.
Standardized Git workflow for pattern development, including rebase strategies, pull request creation, and upstream synchronization for collaborative community repository management.