← ClaudeAtlas

slo-implementationlisted

Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.
HermeticOrmus/claude-code-game-development · ★ 20 · AI & Automation · score 81
Install: claude install-skill HermeticOrmus/claude-code-game-development
# SLO Implementation Framework for defining and implementing Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets. ## Purpose Implement measurable reliability targets using SLIs, SLOs, and error budgets to balance reliability with innovation velocity. ## When to Use - Define service reliability targets - Measure user-perceived reliability - Implement error budgets - Create SLO-based alerts - Track reliability goals ## SLI/SLO/SLA Hierarchy ``` SLA (Service Level Agreement) ↓ Contract with customers SLO (Service Level Objective) ↓ Internal reliability target SLI (Service Level Indicator) ↓ Actual measurement ``` ## Defining SLIs ### Common SLI Types #### 1. Availability SLI ```promql # Successful requests / Total requests sum(rate(http_requests_total{status!~"5.."}[28d])) / sum(rate(http_requests_total[28d])) ``` #### 2. Latency SLI ```promql # Requests below latency threshold / Total requests sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d])) / sum(rate(http_request_duration_seconds_count[28d])) ``` #### 3. Durability SLI ``` # Successful writes / Total writes sum(storage_writes_successful_total) / sum(storage_writes_total) ``` **Reference:** See `references/slo-definitions.md` ## Setting SLO Targets ### Availability SLO Examples | SLO % | Downtime/Month | Downtime/Year | |-------|----------------|---------------| | 99% | 7.2 hours | 3.65 days | | 99.9% | 43.2 minutes | 8.76 hours | | 99.95%| 21.