reflect-appworld-failure
Analyze AppWorld task failures to extract specific API patterns and generate actionable playbook bullets with concrete code examples.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
144 skills found
Analyze AppWorld task failures to extract specific API patterns and generate actionable playbook bullets with concrete code examples.
An intelligent development orchestration skill that provides self-improving code analysis, build error diagnosis, and automated workflow configuration via mcp-prompts integration.
API-first casino for AI agents on Base. Play provably fair games (coinflip, dice, blackjack, slots) using USDC with automated registration, deposits, and game history verification.
Build production-grade AI agents using LangGraph, Anthropic/OpenAI/vLLM, and structured outputs. Features streaming, A2A protocol, Pydantic validation, vector memory, and guardrails for resilient, multi-agent workflows.
Focus testing effort on highest-risk areas using risk assessment and prioritization. Use when planning test strategy, allocating resources, or making coverage decisions.
Real-time observability dashboard for PAI multi-agent activity, featuring live WebSocket streaming, session tracing, and agent workflow debugging.
Autonomous research specialist for verified information gathering, source evaluation, and structured synthesis.
Enforces structured self-assessment checkpoints to validate approach, mitigate risks, and ensure quality before, during, and after task execution.
Diagnose, isolate, and mitigate LLM context failures like lost-in-middle, poisoning, distraction, and context clash to improve agent reliability.
Advanced context engineering system for orchestrating AI agents, memory management, and token optimization to improve long-term persistence and project intelligence.
Queen-led multi-agent orchestration for Claude Code, featuring Byzantine consensus, persistent collective memory, and adaptive task distribution for complex software projects.
Orchestrate parallel Claude Code worker swarms with protocol-based behavioral governance for complex features, multi-step refactors, and long-running autonomous coding sessions.