evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
327 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Architect production-grade LLM applications using LangChain 1.x and LangGraph. Implement stateful AI agents, multi-step workflows, and custom memory systems for complex conversational and automation tasks.
Manage the full lifecycle of blog posts, from initial concept and outlining to drafting and editorial refinement for Nuxt/Vue developers.
Token-efficient virtual task management for AI-assisted development. Manage task lifecycles, dependencies, and TDD workflows with surgical context injection.
Captures session learnings into Reusable Intelligence Infrastructure (RII). Converts one-time bug fixes and pattern discoveries into permanent agent-executable knowledge to prevent recurrence and accelerate future development.
Synchronizes and maintains CLAUDE.md and README.md documentation hierarchy across a repository to ensure consistent, just-in-time context for AI agents.
Validates Skill, Agent, and Command syntax using validate_skills.py, logs errors, and manages the automated QC workflow for agent development.
Statistical visualization library for Python. Create publication-quality graphics like box plots, heatmaps, and violin plots with pandas integration and automatic statistical estimation.
Visualize Azure cloud infrastructure, map resource dependencies, and generate architecture diagrams using Mermaid and PlantUML.
Shared memory and collaboration layer for AI coding agents to track actions, manage sessions, detect conflicts, and preserve project context across tools.
A systematic, multi-angle web research agent. Use for deep investigation, complex queries, and as a mandatory pre-research step before content generation to ensure evidence-backed, high-quality results.
Expert consultant for designing and building high-quality, consistent AI agent skills. Guides you through discovery, architecture, and creation phases to ensure reliable, composable, and efficient skill delivery.