evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
640 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Universal SSH tool for remote server management. Execute commands, manage Docker containers, view logs, and handle server maintenance directly from your Claude session.
Expert guidance for SolidStart server runtime, including request events, middleware, server functions, and API architecture.
Local text-to-speech conversion using Kokoro TTS. Generate audio, read text aloud, and handle multilingual speech synthesis directly in your terminal.
Explains complex concepts using master teaching frameworks like Feynman, Socratic, and Cognitive Load theory to ensure deep, clear understanding.
Test web applications with screen readers like VoiceOver, NVDA, and JAWS. Validate accessibility, debug assistive technology issues, and ensure compliance with screen reader support standards.
Toolkit for testing local web applications using Playwright, featuring server lifecycle management, automated DOM inspection, and browser automation workflows.
A Svelte 5 testing expert using vitest-browser-svelte and Playwright. Provides patterns for unit, SSR, and E2E tests, plus a CLI tool for AI assistants to fetch testing patterns.
Manage personal finances with local SQLite tracking, expense categorization, budget setting, and automated reminders for recurring bills and annual expenses.
Test Adobe EDS blocks interactively in the browser with Jupyter notebooks. Features ES6 imports, overlay previews, responsive device testing, and zero-dependency execution.
Manage CI/CD workflows, Docker containerization, and infrastructure configurations for the multi-chain crypto wallet system.
Run repeatable Maven tests in RDF4J with module-specific workflows, automatic environment refreshing, and actionable failure reporting.