evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
393 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Capture and formalize software development ideas into structured design documents within the Hashbrown repository, including research and conceptual sketches.
Generate production-ready Cloudscape Design System React + TypeScript UI code, components, and scaffolds with accessibility, responsive patterns, and robust state handling.
Convert markdown PRDs into structured prd.json files for the Ralph autonomous AI agent system to enable repeatable, context-aware software development.
Generate images using the Cloudflare Workers AI flux-1-schnell model. Enables text-to-image capabilities directly within your workflow.
Manage long-running PapersFlow DeepScan research workflows with asynchronous monitoring, live progress tracking, and automated report generation.
An intelligent generator for Claude Code Skills that automates the creation of structured prompts, YAML frontmatter, and supporting file architectures.
Interact with GitHub via the gh CLI to manage issues, pull requests, workflow runs, and execute advanced API queries programmatically.
Orchestrates complex multi-agent software development using a structured Royal Navy squadron metaphor, featuring mission planning, parallel task coordination, and rigorous audit logs.
Optimize agent performance and token usage through advanced context compression, structured summarization, and task-oriented state management for long-running sessions.
Verify research idea novelty against recent literature. Use when user says '查新', 'novelty check', or needs to confirm if a method is original.
Build no-code MCP servers that orchestrate tools as directed graphs using YAML for data transformation, conditional routing, and automated workflows.