evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
462 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Shopify integration to manage e-commerce data, products, orders, and customer workflows using Membrane CLI.
Guidance for Model Context Protocol (MCP) server development, including tool design, resource handling, and AI/ML integration patterns.
AI-powered browser automation server for web interaction, data extraction, and research using the Model Context Protocol.
Generate professional, cohesive, project-specific SVG icon sets with consistent style, stroke weight, and visual density. Ideal for unique web and app UI branding.
Manage, run, and update JS framework benchmarks for the Gea framework, including reporting, HTML result generation, and performance comparisons.
Generate, validate, and refine Mermaid diagrams including flowcharts, sequence diagrams, ERDs, and architecture maps to visualize complex software systems and workflows.
A structured guide for novelists to navigate the seven-step writing process, from constitution and specification to planning, tasking, drafting, and quality analysis.
Analyzes Claude Code chat history to identify coding patterns and skill gaps, curates personalized learning resources from HackerNews, and sends progress reports to Slack.
A powerful CLI for converting web content and search results into LLM-friendly formats like Markdown, text, or HTML using the Jina AI Reader API.
Preprocessing and cleaning astronomical light curves using Lightkurve. Tools for outlier removal, flattening, trend detrending, and quality flag handling for time-series analysis.
Language-agnostic backend architectural patterns covering API design, authentication, security protocols, and database modeling.