evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
354 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Optimize agent context windows through KV-caching, observation masking, summarization-based compaction, and context partitioning to reduce costs and latency.
Process massive files and large codebases (10M+ tokens) by recursively chunking, sub-querying, and aggregating results to overcome LLM context limits.
Best practices for building integrations with NetBox REST and GraphQL APIs. Optimize performance, authentication, and architectural patterns for NetBox automations.
An expert-level CTF solver agent that automates reconnaissance, vulnerability analysis, and exploit generation for web, pwn, crypto, reverse, and forensic challenges.
Transcribe audio files directly into text using the OpenAI Whisper API within OpenClaw.
Operate Railway infrastructure: manage projects, services, databases, object storage, deployments, environments, variables, logs, and performance metrics.
Manage your Anki flashcards effortlessly via the AnkiConnect REST API. Create, update, search, and organize decks, notes, and cards directly through your AI agent.
Python coding assistant providing best practices, PEP 8 enforcement, automated testing with pytest, and dependency management using uv.
Automated runtime observability changelog for Claude Code development sessions, tracking file changes, test results, and git commits.
Build AI agents with tool calling and multi-step reasoning. Generate, manage, and orchestrate custom skill files for Claude Code, Cursor, Cline, and other AI assistants to standardize your development workflows.
Terminal-based Spotify playback and search controller for OpenClaw.