service-mesh-observability
Implement production-grade observability for Istio and Linkerd service meshes, including distributed tracing, metric dashboards, and golden signal monitoring.
Introduction
The Service Mesh Observability skill provides a comprehensive framework for instrumenting, monitoring, and debugging complex microservices architectures. Designed for platform engineers and SREs working with Istio and Linkerd, this skill encapsulates industry-standard observability patterns. It allows users to effectively navigate the challenges of distributed systems, such as non-deterministic latency, intermittent network failures, and complex service-to-service communication dependencies. By leveraging the three pillars of observability—metrics, traces, and logs—the agent can guide users through the setup of robust monitoring stacks that provide deep visibility into the mesh. Use this skill when you need to define Service Level Objectives (SLOs) based on real-world golden signals, troubleshoot connectivity bottlenecks, or visualize traffic flow between services to identify performance regressions.
-
Full implementation templates for Istio with Prometheus and Grafana, including custom PromQL queries for request rates, error percentages (5xx), and P99 latency buckets.
-
Deep integration guidance for Jaeger distributed tracing, covering sampling configurations and zipkin collector deployments.
-
Automated Linkerd Viz tooling for live traffic inspection, route-based metrics, and dependency mapping.
-
Standardized Golden Signal dashboarding frameworks focusing on latency, traffic, error rates, and resource saturation.
-
Built-in support for alert threshold definitions and anomaly detection configuration for mesh workloads.
-
Requires an existing Kubernetes cluster with an active service mesh (Istio or Linkerd) installed.
-
Inputs include environment configuration, namespace definitions, and resource names; outputs include YAML manifests, PromQL expressions, and CLI command sequences for observability tooling.
-
Designed for use with kubectl, helm, and platform-specific mesh CLIs.
-
Follows best practices for non-intrusive monitoring to ensure minimal overhead on service performance while maintaining high-fidelity data collection.
Repository Stats
- Stars
- 34,493
- Forks
- 3,737
- Open Issues
- 4
- Language
- Python
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- Apr 29, 2026, 06:18 AM