reflect-appworld-failure
Analyze AppWorld task failures to extract specific API patterns and generate actionable playbook bullets with concrete code examples.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
134 skills found
Analyze AppWorld task failures to extract specific API patterns and generate actionable playbook bullets with concrete code examples.
Systematic debugging skill to trace errors backward through call stacks, identify original triggers, and implement layered defenses instead of patching symptoms.
Comprehensive UI testing, visual fidelity analysis, and browser debugging using Chrome DevTools MCP and AI-driven vision models.
Gate 2 development cycle skill that validates observability implementation, including structured logging, OpenTelemetry tracing, and instrumentation coverage, without modifying code.
Verify research idea novelty against recent literature. Use when user says '查新', 'novelty check', or needs to confirm if a method is original.
Perform automated visual regression testing by comparing UI screenshots against established baselines to identify layout shifts, color changes, and rendering regressions.
Designer's eye QA: detects and automates fixes for visual inconsistencies, spacing, hierarchy, and UI polish issues. Iteratively verifies with before/after screenshots.
Manage long-running PapersFlow DeepScan research workflows with asynchronous monitoring, live progress tracking, and automated report generation.
Security-first vetting protocol for AI agent skills. Detects red flags like credential theft, obfuscated code, and unauthorized data exfiltration before installation.
Security-first auditing framework for AI-generated code. Provides multi-level protection including hardcoded secret detection, dangerous pattern identification, and comprehensive vulnerability audits for modern web applications.
Generate professional-grade sound effects from text descriptions. Create audio textures, cinematic impacts, UI sounds, and ambient environments with precision control over duration, looping, and prompt adherence.
A prototype skill for automating YouTube live chat moderation using pattern-based detection for spam, toxic content, and rate limiting, optimized for testing agent reliability before deployment.