k8s-troubleshooting
AI-powered Kubernetes and OpenShift troubleshooting. Proactively assess cluster health, debug pod failures, analyze logs, and validate security using Popeye-inspired patterns.
Introduction
The k8s-troubleshooting skill provides an intelligent, agent-based interface for managing and diagnosing Kubernetes and OpenShift environments. It is designed for DevOps engineers, SREs, and cluster administrators who require rapid root cause analysis and proactive health monitoring. By integrating patterns from the Popeye utility, the skill identifies misconfigurations, resource inefficiencies, and security vulnerabilities across the cluster lifecycle.
-
Performs comprehensive cluster health assessments including node, pod, and service status checks.
-
Executes automated diagnostics for common failure modes like CrashLoopBackOff, ImagePullBackOff, OOMKilled, and PVC pending states.
-
Facilitates deep log analysis using tools like stern and kubectl for multi-pod streaming and event interpretation.
-
Validates security and RBAC configurations, including detecting privileged containers and root user processes.
-
Optimizes resource allocation by identifying missing limits and requests in container specifications.
-
Supports OpenShift-specific troubleshooting, covering SCCs, Routes, Operators, and custom BuildConfigs.
-
Offers native command translation between standard Kubernetes (kubectl) and OpenShift (oc) operations based on cluster detection.
-
Provides automated recommendations for performance tuning, reliability improvements, and HA (High Availability) posture.
-
Usage Note: This skill utilizes kubectl as the primary interface. Ensure you have the appropriate context, namespace permissions, and CLI tools like kubectl, oc, k9s, and krew installed locally.
-
Inputs: Accepts natural language requests such as 'Why is this pod crashing?', 'Check cluster security', or 'Find storage issues in the production namespace'.
-
Outputs: Provides diagnostic reports, actionable remediation steps, and direct CLI command suggestions for resolving identified issues.
-
Constraints: Reliability is dependent on cluster connectivity and read/list access permissions across relevant namespaces.
Repository Stats
- Stars
- 4
- Forks
- 1
- Open Issues
- 0
- Language
- TypeScript
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 4, 2026, 01:18 AM