eval-harness
Official evaluation framework for AI agent sessions, implementing Evaluation-Driven Development (EDD) principles to ensure reliability.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
444 skills found
Official evaluation framework for AI agent sessions, implementing Evaluation-Driven Development (EDD) principles to ensure reliability.
Guidance on React Server Components (RSC) in Next.js, covering server/client component boundaries, data fetching, and composition patterns.
Token-efficient codebase analysis skill for call graphs, semantic search, impact analysis, and data flow. Saves ~95% tokens vs. raw reads.
Standardized React UI patterns for loading states, error handling, and data fetching to ensure consistent UX and robust component architecture.
CLI-only iOS development agent for Swift, SwiftUI, and UIKit. Handles the full lifecycle: build, debug, test, and release without Xcode.
A specialized decision-making agent for complex architectural choices, task planning, and error resolution within the orchestration system.
Vitest testing patterns for reliable unit and integration tests. Focuses on critical business logic, edge cases, and mocking strategies for high-impact functions.
A multi-paradigm ETL pipeline agent supporting batch and streaming data processing, schema inference, and configurable DAG-based transformations for heterogeneous data sources.
Refactors SwiftUI views for clean architecture, consistent property ordering, efficient dependency injection, and correct @Observable state management.
Autonomous recursive execution engine for indiiOS that manages task completion, state verification, and error handling.
Structured problem-framing tool for design sprints and product strategy. Facilitates collaborative or individual sessions to define goals, stakeholders, constraints, and pain points before solution generation.
Implementation patterns for MERIDIAN autonomous AI agents using Claude API, including BaseAgent lifecycle, structured tool use, token budget enforcement, and cron scheduling.