evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
557 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Repository implementation guide for local-skills-mcp. Provides technical documentation on MCP tool handlers, skill loading, aggregation logic, and project structure for developers.
Perform internet searches using the Zhipu AI web search API to retrieve real-time information, news, and current data.
Create, test, and validate custom Semgrep rules for security vulnerabilities and code pattern detection.
A framework for building modular, reusable agent skills. Provides guidelines for structuring SKILL.md, bundled scripts, references, and assets to extend Claude's capabilities.
Scaffold and implement authentication in TypeScript/JavaScript apps using Better Auth. Detects frameworks, configures database adapters, sets up route handlers, adds OAuth providers, and scaffolds UI pages.
Development and maintenance of the PWAFire library: build PWA API modules, handle feature detection, manage testing, and contribute to codebase following strict sync/async patterns and error handling requirements.
A security scanner for Claude Skills to detect malicious code, data exfiltration risks, and unauthorized system access before installation.
Implement React 19 patterns: React Compiler, Server Actions, Forms, and new hooks like 'use'. Guide decisions between Actions vs TanStack Query for mutations.
Build interactive, hypermedia-driven web applications using Rust, Axum, and HTMX for dynamic, real-time UI updates without complex JavaScript frameworks.
Integrate Snowflake with MCP clients. Manage Snowflake endpoints, validate connectivity, and leverage Cortex AI (Search, Analyst, Agent) services directly within your AI workflow.
Bridge assets from EVM chains to Starknet, deploy agent accounts, and register identities with the HuginnRegistry for autonomous AI agent onboarding.