evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
496 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Programmatic parametric CAD for AI agents. Create 3D models, mechanical parts, and complex geometries via MCP tools with support for boolean operations, patterns, and multi-format exports (STL, GLB, STEP).
Connect your AI agent to the Hugging Face Hub via MCP. Search models, datasets, and papers, manage repos, run cloud compute jobs, and invoke Gradio Spaces as functional AI tools.
Automated GitHub issue analysis, triage, and resolution planning tool integrated with Specification Driven Development (SDD) workflows.
Discover and install agent skills to extend your DeerFlow capabilities. Use this to find tools, workflows, or specialized knowledge for tasks like coding, testing, and deployment.
Generate high-quality Japanese puns (dajare) based on keywords, topics, or situations. Includes rhyme analysis and contextual humor generation.
Analyze and debug fast-agent session histories, tool execution logs, and conversation timing to resolve performance bottlenecks, tool loops, and unexpected session terminations.
Standardizes project context by managing artifacts (product, tech-stack, workflow, tracks) in a conductor/ directory. Supports project scaffolding, artifact synchronization, and AI alignment for greenfield and brownfield projects.
Maintain and update the MassGen model registry, including backend capabilities, model metadata, pricing structures, and context window configurations for new and existing AI models.
Search and reference Chromium documentation, including design docs, APIs, and development guides. Use to locate, browse, or learn about architecture, GPU, network, security, and testing concepts within the Chromium codebase.
Expert automated code review for Go CLI applications, focusing on Cobra/urfave patterns, security, performance, idiomatic Go, and robust error handling.
Validate test suite effectiveness and uncover weak assertions by introducing code mutations and measuring kill rates. Essential for proving tests genuinely catch bugs rather than just satisfying coverage metrics.