eval-harness
Official evaluation framework for AI agent sessions, implementing Evaluation-Driven Development (EDD) principles to ensure reliability.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
483 skills found
Official evaluation framework for AI agent sessions, implementing Evaluation-Driven Development (EDD) principles to ensure reliability.
Implement adaptive learning with ReasoningBank for pattern recognition, strategy optimization, and continuous improvement in AI agents.
Executes Gradle-based Java tests, filters results for failures and key statistics, and provides concise reports to streamline backend development and debugging.
Focus debug skill for DashPlayer: isolates log chains, injects temporary focus markers ([FOCUS:token]), and ensures clean removal of debug artifacts after task completion.
Plan mode on steroids. Push engineers to think with a product mindset before building with structured intake and concrete technical options.
Verifies blockchain smart contract code against technical specifications, whitepapers, and design documents to ensure exact implementation compliance.
Dialectical reasoning and adversarial coding agent for MCP-enabled editors, forcing LLMs to resolve internal contradictions for higher quality outputs.
AI-assisted version control for code agents. Track prompts, context, and diffs automatically with MemoV to ensure full traceability without polluting your git history.
Create structured user stories using the Mike Cohn format with Gherkin acceptance criteria to translate requirements into testable, development-ready tasks.
Design and document REST or GraphQL APIs, including endpoint definitions, pagination, filtering, versioning, and OpenAPI/Swagger specifications.
Mandatory execution-based validation for all software implementation tasks. Ensures code works through empirical verification before confirmation.
Autonomous research specialist for verified information gathering, source evaluation, and structured synthesis.