k8s-troubleshooter
Systematic Kubernetes troubleshooting, pod diagnostics, cluster health monitoring, and incident response playbooks.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
141 skills found
Systematic Kubernetes troubleshooting, pod diagnostics, cluster health monitoring, and incident response playbooks.
Generate incident response timelines and structured report packs from event logs to facilitate efficient detection-to-recovery tracking.
DevOps and platform engineering patterns: Kubernetes, Terraform, GitOps, CI/CD, observability, incident response, and cloud-native ops.
Security advisory monitoring for NanoClaw WhatsApp bots, providing vulnerability scanning, skill safety checks, and integrity protection through MCP tools.
Emergency recovery suite for Vercel-hosted projects. Manage deployment rollbacks, database migration reverts, cache invalidation, and health verification workflows.
Interactive CLI-based issue management system for tracking, planning, and executing development tasks with full CRUD capabilities.
Focus testing effort on highest-risk areas using risk assessment and prioritization. Use when planning test strategy, allocating resources, or making coverage decisions.
Maintain a structured DEBUG_LOG.md for recording bugs, debugging processes, and solutions to ensure project stability and knowledge retention.
Production-grade observability stack featuring Prometheus metrics, Grafana dashboarding, PromQL query language, alerting rules, and AI-powered anomaly detection for cloud-native applications.
A rigorous, four-phase methodology to enforce systematic root cause analysis before applying any code fixes.
An AI-powered TestOps platform and MCP server providing automated failure analysis, RCA matching, and intelligent test orchestration for CI/CD pipelines.
Enforces a strict evidence-based debugging workflow using structured observation, hypothesis testing, and causality validation to eliminate speculation in technical investigations.