← ClaudeAtlas

observabilitylisted

Use for observability, logs, metrics, traces, health checks, dashboards, alerts, and SLOs.
kreek/consult · ★ 1 · AI & Automation · score 74
Install: claude install-skill kreek/consult
# Observability ## Iron Law `NO USER-REACHABLE SERVICE PATH SHIPS BLIND.` ## When to Use - Logs, metrics, traces, health checks, dashboards, SLOs, alerts, dependency health, incident diagnosis, OpenTelemetry, RED/USE, cardinality, exemplars, or burn-rate alerts. ## When NOT to Use - Local-only scripts or libraries with no operational surface. - Error type design; use `error-handling`. - Release sequencing; use `release`. ## Core Ideas 1. Instrument behavior customers depend on, not just process internals. 2. Logs are structured events with stable names, typed fields, severity, outcome, and trace/correlation IDs. JSON alone is not enough. 3. Use OpenTelemetry semantic conventions where they exist before inventing custom field names. 4. Metrics need bounded labels; cardinality is a production cost and reliability risk. 5. Traces show cross-boundary causality; logs explain decisions. 6. Critical dependencies expose latency, error, timeout, retry, circuit-breaker state, and saturation signals. 7. Dashboards answer current health and likely fault location. Alerts are SLO-backed, actionable, and tied to runbooks. 8. Health checks separate liveness from readiness. 9. Sensitive data is redacted at source; collector filtering is defense in depth. ## Workflow 1. Identify the user-facing path, dependency, queue, or resource being observed. Choose RED for request paths, USE for resources. 2. Add structured logs, metrics, and spans per project convent