evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
452 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Manage type-safe Laravel backend routing within Inertia.js React frontend components using the Wayfinder library.
Specialized skill for creating, editing, and maintaining .drawio diagrams. Supports XML manipulation, layout optimization, AWS icon usage, and automated PNG conversion for documentation.
AI-assisted version control for code agents. Track prompts, context, and diffs automatically with MemoV to ensure full traceability without polluting your git history.
Analyze C++ code for real-time safety violations including heap allocations, locks, blocking calls, and non-deterministic operations in high-performance audio threads.
Perform advanced video analysis using Google's Gemini API: summarize content, transcribe audio, extract timestamps, clip segments, and analyze YouTube URLs or local files with support for multiple models and long contexts.
A systematic code auditing framework for identifying technical debt, security vulnerabilities, dead code, and code quality issues in software projects.
Apply reality-first coding standards: intentional naming, focused functions, guard clauses, and deterministic side effects, with no speculative features.
Correlate content attributes with GA4 and GSC metrics to identify performance drivers and optimization opportunities.
Development and maintenance of the PWAFire library: build PWA API modules, handle feature detection, manage testing, and contribute to codebase following strict sync/async patterns and error handling requirements.
Analyze Claude Code session history to identify inefficiencies, optimize token usage, and suggest workflow improvements.
A highly customized personal garden based on Quartz v4, featuring enhanced Markdown parsing, telescopic text, TikZ/pseudocode rendering, and Obsidian integration.