evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
157 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Universal CLI tool to convert and synchronize AI agent skills between Claude Code and Gemini CLI extensions.
Normalizes testing defect logs by correcting typos, abbreviations, and ambiguous descriptions based on product-specific codebooks and station validation.
Epistemic safety analysis for JSON data in prompts to prevent LLM hallucinations and reasoning errors when handling incomplete or large-scale datasets.
Extract and document authentic writing voice from samples. Create comprehensive voice guides for AI training, ghostwriting, and brand consistency.
Create a hypothesis-based proto-persona using research, market signals, and team insights to align product teams before full validation.
End-to-end startup idea validation using S.E.E.D. niche checks, STREAM 6-layer analysis, and Devil's Advocate inversion to generate PRDs.
A design system and anti-pattern guide to make AI-generated UI look human-crafted. Ensures professional aesthetics by managing color, typography, spacing, and animations for the Toh Framework.
Orchestrates multi-agent iterative refinement for high-quality OpenClaw skill development, ensuring rigorous testing and lifecycle management.
Intelligent contract review tool for identifying risks, extracting key terms, and flagging unusual clauses to support informed decision-making.
Deep document structure analysis and intelligent content extraction for knowledge bases.
Analyze Claude Code session history to identify inefficiencies, optimize token usage, and suggest workflow improvements.