evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
250 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
High-performance in-memory DataFrame library for Python and Rust. Features lazy evaluation, parallel execution, and an Apache Arrow backend for efficient ETL, data processing, and faster pandas alternatives.
Full-stack web development suite featuring Next.js (SSR/RSC/App Router), Turborepo for monorepo management, and RemixIcon for UI assets. Optimized for modern React, high-performance builds, and scalable architecture.
A friendly welcome skill that displays system OS details in ASCII art when triggered by casual greetings like 'hello' or 'hi'.
Master modern React state management. Learn to implement Redux Toolkit, Zustand, Jotai, and React Query for global, server, and local state.
Transform technical documentation into a growth engine. Learn to write docs that improve SEO, reduce time-to-value, and convert developers by mastering information architecture and developer-focused writing.
Symbol-level code understanding and navigation agent toolkit using LSP for precise code analysis, reference tracking, and surgical refactoring across 30+ programming languages.
Build distinctive, production-grade frontend interfaces and web components with high aesthetic quality, avoiding generic AI design patterns.
Initiates automated reverse engineering by discovering codebase architecture, layers, and technology stacks to facilitate system modernization or documentation.
Transcribe audio files (wav, mp3, ogg) to text using the Qwen ASR model. Fast, local-friendly, and requires no API keys.
Autonomous multi-team codebase improvement agent with specialized modes: narrow (goal-directed), broad (hypothesis-divergent), and sweep (quality-focused).
Explains complex concepts using master teaching frameworks like Feynman, Socratic, and Cognitive Load theory to ensure deep, clear understanding.