evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
122 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production applications.
Implement LlamaExtract for robust structured data extraction from PDF, DOCX, and PPTX files using Pydantic schemas.
Full-stack automated paper writing pipeline from research narrative to polished LaTeX/PDF.
Synthesizes multi-agent research findings into coherent, citation-backed reports, resolving contradictions and identifying consensus.
Query the Pollinations text API with web-search enabled models like Gemini and Perplexity for grounded, real-time research.
Orchestrate complex multi-agent swarms with topologies like mesh, hierarchical, and star for research, development, and testing workflows.
Handles large-scale tasks by automatically breaking them down into manageable, recursive sub-tasks to overcome context window limits and improve reasoning accuracy on large codebases and document sets.
Generate personalized, verified daily news briefings tailored to your interests, projects, and competitive landscape with strict 7-day source freshness.
A perspective engineering engine that researches, extracts mental models, and generates runnable persona skills based on deep expression DNA analysis.
Intelligent research agent that automatically routes queries between fast web search, deep multi-source synthesis, and academic database lookups.
Execute the implementation planning workflow, generate technical design artifacts, and structure research tasks for Spec Kit projects.