eval-harness
Official evaluation framework for AI agent sessions, implementing Evaluation-Driven Development (EDD) principles to ensure reliability.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
248 skills found
Official evaluation framework for AI agent sessions, implementing Evaluation-Driven Development (EDD) principles to ensure reliability.
Identify and document Customer Problems (CP) from business context. Use when starting requirements engineering or when stakeholders describe solutions instead of problems. Step 1 of Problem-Based SRS methodology.
Generate structured development plans, checklists, and file contexts compatible with the IntelliJ coding-aider plugin.
Implement ReasoningBank adaptive learning with AgentDB's ultra-fast vector backend. Features trajectory tracking, verdict judgment, memory distillation, and pattern recognition for self-learning autonomous agents.
A stage-driven AI writing agent for structured, repeatable, and reversible long-form content production with human-in-the-loop workflows.
Connect your AI agent to the Hugging Face Hub via MCP. Search models, datasets, and papers, manage repos, run cloud compute jobs, and invoke Gradio Spaces as functional AI tools.
Structured problem-framing tool for design sprints and product strategy. Facilitates collaborative or individual sessions to define goals, stakeholders, constraints, and pain points before solution generation.
Generates structured Handoff Pack prompts for delegating scoped coding tasks to Gemini with clear instructions, acceptance criteria, and output requirements.
Generate professional equity research snapshots using consensus estimates, company fundamentals, historical pricing, and macroeconomic indicators to build investment theses.
Expert consultant for designing and building high-quality, consistent AI agent skills. Guides you through discovery, architecture, and creation phases to ensure reliable, composable, and efficient skill delivery.
Generate optimized SQL queries from natural language. Supports BigQuery, PostgreSQL, MySQL, and Snowflake. Analyze database schemas, interpret business requirements, and output ready-to-run queries with explanations.
A framework for managing the end-to-end LLM project lifecycle, from evaluating task-model fit and pipeline architecture design to implementing structured output parsing and agent-assisted development.