evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
458 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Maintains a centralized architecture overview with Mermaid diagrams to document system boundaries, module dependencies, and interface contracts for onboarding and refactoring.
Fetch real-time financial signals, transmission-chain reasoning, and market confidence metrics directly from the DeepEar Lite platform.
Multi-source research tool for customer inquiries, bug investigations, and account history synthesis with source attribution and confidence scoring.
Automated LaTeX compilation, error diagnosis, and PDF verification pipeline for academic paper submissions.
Expert guide for OpenCode AI: TUI commands, CLI operations, AGENTS.md configuration, custom agent workflows, and project setup.
AI-native product management tool for startups. Features automated competitor research, gap analysis using the WINNING filter, PRD generation, and GitHub Issues integration for prioritized, signal-based roadmap planning.
Automate clinical report generation including CARE-compliant case reports, diagnostic summaries, clinical trial documentation (CSR/SAE), and patient notes with regulatory compliance.
Analyzes codebases to generate hierarchical documentation, onboarding guides, and architectural mapping, helping teams understand and document their projects efficiently.
Generates cloud architecture diagrams directly from Terraform (.tf) files. Parses HCL, maps resource dependencies, and visualizes infrastructure automatically using Eraser.
Structured reasoning tool for complex problem decomposition, step-by-step analysis, consistency verification, and evidence-based synthesis with confidence scoring.
Generate hierarchical, AI-optimized documentation structures (AGENTS.md, agent.d) to streamline codebase context, setup, and navigation for AI coding assistants and developers.