← ClaudeAtlas

monitoring-observabilitylisted

Provides monitoring and observability best practices covering the three pillars (logs, metrics, traces), OpenTelemetry instrumentation, Prometheus/Grafana dashboards, SLO-based alerting, and APM strategies. Use when setting up monitoring, observability, prometheus, grafana, opentelemetry, alerting, tracing, logging, metrics, dashboards, SLOs, or APM.
Tibsfox/gsd-skill-creator · ★ 61 · AI & Automation · score 74
Install: claude install-skill Tibsfox/gsd-skill-creator
# Monitoring and Observability Production systems require visibility into their behavior. Observability goes beyond simple monitoring by enabling you to ask arbitrary questions about system state using logs, metrics, and traces. This guide covers instrumentation, collection, visualization, alerting, and the operational patterns that prevent alert fatigue while keeping systems reliable. ## The Three Pillars | Pillar | What It Captures | Best For | Key Tools | |--------|-----------------|----------|-----------| | Logs | Discrete events with context | Debugging specific requests, audit trails | ELK, Loki, CloudWatch Logs | | Metrics | Numeric measurements over time | Trends, thresholds, capacity planning | Prometheus, Datadog, CloudWatch Metrics | | Traces | Request flow across services | Latency breakdown, dependency mapping | Jaeger, Tempo, X-Ray | | Question | Signal | |----------|--------| | "Why did this request fail?" | Logs (event detail) + Traces (call chain) | | "Is error rate increasing?" | Metrics (counters over time) | | "Which service is slow?" | Traces (span timing) | | "What happened at 3:42 AM?" | Logs (timestamped events) | | "Are we within SLO budget?" | Metrics (error ratio, latency percentiles) | | "How do services depend on each other?" | Traces (service graph) | ## OpenTelemetry SDK Setup OpenTelemetry provides a vendor-neutral API for emitting all three signals. Instrument once, export anywhere. ### Node.js Auto-Instrumentation ```typescript // tra