evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
193 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Private skill distribution system for managing agentics across devices and teams. Install, sync, add, and update your agents, skills, and prompts via a central library catalog.
Development and maintenance of the PWAFire library: build PWA API modules, handle feature detection, manage testing, and contribute to codebase following strict sync/async patterns and error handling requirements.
Automates research resource preparation by loading instances, searching GitHub for codebases, building dataset descriptions, and downloading arXiv papers.
Virtual machine development expert focusing on bytecode design, stack-based/register-based VM implementation, memory management, and garbage collection.
Search and retrieve AI-generated documentation, architecture guides, and API references for 300+ popular GitHub repositories using DeepWiki and MCP.
Guidelines for testing HashQL code using compiletest (UI tests), unit tests, and insta snapshots. Includes commands for --bless, annotation syntax, and strategies for compiler components.
A specialized skill for surgical code refactoring. Improves maintainability, reduces technical debt, and applies design patterns without altering external behavior.
Optimize Node.js performance via Redis caching, clustering, profiling, and monitoring to build fast, scalable, and efficient backend services.
Your personal AI coding tutor that creates customized tutorials based on your actual codebase, tracks your learning progress, and uses spaced repetition to ensure mastery.
UI component patterns and touch input handling for M5Stack Tab5 applications using M5GFX and LVGL.
Standardizes Fish shell configuration, scripting patterns, and system management for dotfiles environments.