evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
320 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
A toolkit for building robust LLM integrations: API patterns, streaming, function calling, RAG pipelines, and cost-effective model routing.
Download and analyze YouTube video transcripts to extract technical insights, summarize complex tutorials, and relate video content to your codebase.
Transforms chat conversations into structured Notion documentation, saving insights, decisions, and knowledge to your workspace with proper organization.
Automated global intelligence aggregator for market, geopolitical, and AI news. Features RSS feed integration, real-time alert systems for critical events, and structured report generation with intelligence inference.
Neural web search and code context retrieval via Exa AI. Ideal for documentation, technical research, code examples, and company intelligence.
Deep inquiry framework using Socratic questioning to examine beliefs, uncover hidden assumptions, test evidence, and reach nuanced understanding without lecturing.
Structured reasoning tool for complex problem decomposition, step-by-step analysis, consistency verification, and evidence-based synthesis with confidence scoring.
A unified interface for integrating and managing LLM chat providers like OpenAI, Anthropic, Google, Azure, and Bedrock within LangChain applications.
Structured parallel brainstorming agent for ideation and conceptual expansion. Uses multi-agent perspectives to evolve vague ideas into practical, actionable visions. Ideation only, not for task planning.
Manually triggers a Hipocampus memory flush to persist current session context to raw logs and initiate the compaction tree process for long-term agent memory maintenance.
Generate real-time AI podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model with WebSocket streaming, complete with PCM to WAV conversion and frontend playback integration.