monitoring-opslisted
Install: claude install-skill 0xDarkMatter/claude-mods
# Monitoring Operations
Comprehensive observability patterns covering the three pillars (metrics, logging, tracing), alerting strategies, dashboard design, and infrastructure monitoring for production systems.
---
## Three Pillars Quick Reference
Use this table to decide which observability signal fits your need:
| Pillar | Best For | Tools | Data Type |
|--------|----------|-------|-----------|
| **Metrics** | Aggregated numeric measurements, trends, alerting on thresholds | Prometheus, Datadog, CloudWatch, StatsD | Time-series (numeric) |
| **Logs** | Discrete events, error details, audit trails, debugging context | Loki, ELK, CloudWatch Logs, Fluentd | Unstructured/structured text |
| **Traces** | Request flow across services, latency breakdown, dependency mapping | Jaeger, Tempo, Zipkin, Datadog APM | Span trees (structured) |
**When to use which:**
- **"How many requests per second?"** → Metrics (counter + rate)
- **"Why did this specific request fail?"** → Logs (error message + stack trace)
- **"Where is the latency in this request?"** → Traces (span waterfall)
- **"Is the system healthy right now?"** → Metrics (gauges + alerts)
- **"What happened at 3:42 AM?"** → Logs (timestamped event search)
- **"Which downstream service caused the timeout?"** → Traces (span analysis)
**Correlation is key:** Connect all three by embedding `trace_id` in log entries, recording exemplars in metrics, and linking trace spans to log queries.
---
## Metrics Type Decision Tree
Us