← ClaudeAtlas

sre-reliability-auditlisted

Assess Site Reliability maturity across five dimensions — SLOs/SLIs, runbooks, on-call, postmortems, game days — with per-dimension commentary and uplift path. Static, live (PagerDuty/Opsgenie), and runtime (game day) modes.
anthril/official-claude-plugins · ★ 3 · AI & Automation · score 82
Install: claude install-skill anthril/official-claude-plugins
# SRE Reliability Audit <!-- anthril-output-directive --> > **Output path directive (canonical — overrides in-body references).** > All file outputs from this skill MUST be written under `.anthril/audits/sre-reliability-audit/`. > Run `mkdir -p .anthril/audits/sre-reliability-audit` before the first `Write` call. > Primary artefact: `.anthril/audits/sre-reliability-audit/<artefact>`. > Do NOT write to the project root or to bare filenames at cwd. > Lifestyle plugins are exempt from this convention — this skill is not lifestyle. ## When to use Run this skill when the user mentions: - SRE audit, reliability maturity - SLO review, SLI definition, error-budget alerts - Runbook audit, runbook quality - On-call review, rotation, escalation - Postmortem quality, blameless template - Game day, chaos engineering at the organisational level Narrative assessment — scores five dimensions 0”“4 rather than emitting findings-with-IDs. SLOs and SLIs (defined, error budgets computed, burn-rate alerts, review cadence), runbook quality (one per paging alert, executable steps, freshness SLA), on-call readiness (rotation, escalation policy, handover template, fair schedule), postmortem culture (blameless template, action-item tracking, retrospective cadence), and game days (scoped chaos experiments, scheduled, learnings documented). ## Before You Start 1. **Determine operating mode.** `--live` reads from PagerDuty/Opsgenie/incident.io via API keys in env (`PD_API_KEY`, `OG_API_KEY`). `--run