evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
526 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Implement robust Rust backend services using Axum, SQLx, and thiserror with production-grade patterns.
Read, write, and manage Feishu (Lark) cloud documents. Supports markdown, block manipulation, tables, and media attachments.
A unified document processing gateway for PDF parsing, text extraction, conversion, and document manipulation across multiple local and cloud providers.
Transform passive learning content like transcripts and tutorials into actionable Ship-Learn-Next cycles with concrete implementation plans and progress-oriented quests.
A rigorous, four-phase methodology to enforce systematic root cause analysis before applying any code fixes.
Generates llms.txt and llms-full.txt files to provide LLM-friendly documentation and project context.
Automated runtime observability changelog for Claude Code development sessions, tracking file changes, test results, and git commits.
A project-specific template skill for maintaining architectural consistency, coding standards, and deployment workflows in AI-powered full-stack applications.
Multi-perspective AI consultation for technical architecture, complex refactoring, and structured debugging.
Perform comprehensive code reviews with a focus on security vulnerabilities, performance optimization, maintainability, and code correctness.
Cascading goal tracking system connecting 3-year vision to daily tasks. Automates progress calculation, stalled goal detection, and project-to-goal alignment for Obsidian vaults.