slo-sli-design

Solid

Define SLIs, SLOs, and error budgets; implement burn rate alerts; integrate with Prometheus.

AI & Automation 14 stars 3 forks Updated 3 days ago MIT

Install

View on GitHub

Quality Score: 86/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Skill: SLO/SLI Design > **Expertise:** SLI selection, SLO target setting, error budget calculation, burn rate alerting, Sloth/pyrra integration. ## When to load When defining SLOs for a new service, setting up error budget tracking, or reviewing existing SLOs after an incident. ## SLI Selection Framework ``` Step 1: What does the user care about? → "The checkout completes successfully and quickly" Step 2: What CAN we measure? → HTTP 2xx responses, p99 latency Step 3: Define the SLI formula → Availability SLI: good_requests / total_requests where good = status < 500 AND latency < 500ms Step 4: Pick SLO target (start conservative, tighten later) → 99.5% (don't chase 99.99% without data — high budget wasted on caution) Step 5: Calculate error budget → 100% - 99.5% = 0.5% over 28 days = 0.5% × 28 × 24 × 60 = 201.6 minutes ``` ## Prometheus SLO Implementation (manual) ```yaml # Recording rules for SLO tracking groups: - name: slo.checkout-service interval: 30s rules: # Good requests (2xx, latency < 500ms) - record: slo:http_requests_good:rate5m expr: | sum(rate(http_requests_total{ service="checkout-service", status=~"2..", duration_bucket="0.5" }[5m])) # Total requests - record: slo:http_requests_total:rate5m expr: | sum(rate(http_requests_total{service="checkout-service"}[5m])) # SLI = good / total - record: slo:http_availa...

Details

Author: sawrus
Repository: sawrus/agent-guides
Created: 3 months ago
Last Updated: 3 days ago
Language: Shell
License: MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

slo-implementation

Implement SLOs end-to-end in Prometheus — recording rules, burn rate alerts, error budget dashboards, and Sloth/pyrra integration.

14 Updated 3 days ago

sawrus

AI & Automation Listed

slo-implementation

Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.

21 Updated 5 days ago

HermeticOrmus

AI & Automation Listed

slo-implementation

335 Updated today

aiskillstore

AI & Automation Solid

slo-implementation

36,166 Updated yesterday

wshobson

AI & Automation Featured

slo-implementation

Framework for defining and implementing Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.

39,227 Updated today

sickn33