← ClaudeAtlas

monitoring-opslisted

Observability patterns - metrics, logging, tracing, alerting, and infrastructure monitoring. Use for: monitoring, observability, prometheus, grafana, metrics, alerting, structured logging, distributed tracing, opentelemetry, SLO, SLI, dashboard, health check, loki, jaeger, datadog, pagerduty.
0xDarkMatter/claude-mods · ★ 22 · DevOps & Infrastructure · score 74
Install: claude install-skill 0xDarkMatter/claude-mods
# Monitoring Operations Comprehensive observability patterns covering the three pillars (metrics, logging, tracing), alerting strategies, dashboard design, and infrastructure monitoring for production systems. --- ## Three Pillars Quick Reference Use this table to decide which observability signal fits your need: | Pillar | Best For | Tools | Data Type | |--------|----------|-------|-----------| | **Metrics** | Aggregated numeric measurements, trends, alerting on thresholds | Prometheus, Datadog, CloudWatch, StatsD | Time-series (numeric) | | **Logs** | Discrete events, error details, audit trails, debugging context | Loki, ELK, CloudWatch Logs, Fluentd | Unstructured/structured text | | **Traces** | Request flow across services, latency breakdown, dependency mapping | Jaeger, Tempo, Zipkin, Datadog APM | Span trees (structured) | **When to use which:** - **"How many requests per second?"** → Metrics (counter + rate) - **"Why did this specific request fail?"** → Logs (error message + stack trace) - **"Where is the latency in this request?"** → Traces (span waterfall) - **"Is the system healthy right now?"** → Metrics (gauges + alerts) - **"What happened at 3:42 AM?"** → Logs (timestamped event search) - **"Which downstream service caused the timeout?"** → Traces (span analysis) **Correlation is key:** Connect all three by embedding `trace_id` in log entries, recording exemplars in metrics, and linking trace spans to log queries. --- ## Metrics Type Decision Tree Us