monitoring-observabilitylisted
Install: claude install-skill Tibsfox/gsd-skill-creator
# Monitoring and Observability
Production systems require visibility into their behavior. Observability goes beyond simple monitoring by enabling you to ask arbitrary questions about system state using logs, metrics, and traces. This guide covers instrumentation, collection, visualization, alerting, and the operational patterns that prevent alert fatigue while keeping systems reliable.
## The Three Pillars
| Pillar | What It Captures | Best For | Key Tools |
|--------|-----------------|----------|-----------|
| Logs | Discrete events with context | Debugging specific requests, audit trails | ELK, Loki, CloudWatch Logs |
| Metrics | Numeric measurements over time | Trends, thresholds, capacity planning | Prometheus, Datadog, CloudWatch Metrics |
| Traces | Request flow across services | Latency breakdown, dependency mapping | Jaeger, Tempo, X-Ray |
| Question | Signal |
|----------|--------|
| "Why did this request fail?" | Logs (event detail) + Traces (call chain) |
| "Is error rate increasing?" | Metrics (counters over time) |
| "Which service is slow?" | Traces (span timing) |
| "What happened at 3:42 AM?" | Logs (timestamped events) |
| "Are we within SLO budget?" | Metrics (error ratio, latency percentiles) |
| "How do services depend on each other?" | Traces (service graph) |
## OpenTelemetry SDK Setup
OpenTelemetry provides a vendor-neutral API for emitting all three signals. Instrument once, export anywhere.
### Node.js Auto-Instrumentation
```typescript
// tra