advanced-evaluation
Implement production-grade LLM-as-a-judge pipelines for model evaluation, including pairwise comparison, direct scoring, bias mitigation, and rubric generation.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
2091 skills found
Implement production-grade LLM-as-a-judge pipelines for model evaluation, including pairwise comparison, direct scoring, bias mitigation, and rubric generation.
Synthesize interview transcripts into a structured template including Jobs to Be Done (JTBD), satisfaction signals, and actionable items.
Coverage-guided fuzzer for Ruby code and C extensions, powered by libFuzzer and address sanitizers to detect memory corruption and undefined behavior.
Perform network protocol reverse engineering, including packet capture, traffic analysis, protocol dissection, and custom format documentation.
Scans Solana programs (native/Anchor) for 6 critical vulnerabilities, including arbitrary CPI, improper PDA validation, and missing ownership checks, providing detailed fix recommendations.
A framework for crafting suspense, detective, and mystery narratives, emphasizing fair play principles, clue placement, and plot structure.
Enhance image quality, resolution, and sharpness for screenshots and digital media. Perfect for professional documentation, blogs, and presentations.
Synchronizes and maintains CLAUDE.md and README.md documentation hierarchy across a repository to ensure consistent, just-in-time context for AI agents.
Generate and update PyTorch-compliant function and method docstrings using reStructuredText/Sphinx conventions.
Production-grade React 19 and TypeScript patterns featuring hooks, state management, TanStack Query, form validation with Zod, and performance optimization workflows.
An automated meta-learning skill that improves agent workflows by capturing patterns, failures, and shortcuts after each task execution.
Create and train custom reinforcement learning plugins for autonomous agents using 9 core algorithms including Decision Transformer and Actor-Critic for self-optimizing behavior.