prometheus-alertmanager

Solid

Write production-quality Prometheus alert rules, recording rules, and Alertmanager routing configs.

AI & Automation 14 stars 3 forks Updated 3 days ago MIT

Install

View on GitHub

Quality Score: 86/100

Stars 20%
39
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
80
License 10%
100
Description 5%
100

Skill Content

# Skill: Prometheus & Alertmanager > **Expertise:** PromQL, alert rules, recording rules, Alertmanager routing, inhibition, silences. ## When to load When writing alert rules, debugging PromQL, configuring Alertmanager routing, or investigating a firing alert. ## Golden Signal Alert Rules ```yaml # alerts/service-golden-signals.yaml groups: - name: service.golden-signals rules: # ── Errors ──────────────────────────────────────── - alert: HighErrorRate expr: | ( sum(rate(http_requests_total{status=~"5.."}[5m])) by (namespace, service) / sum(rate(http_requests_total[5m])) by (namespace, service) ) > 0.01 for: 2m labels: severity: critical annotations: summary: "Error rate > 1% — {{ $labels.service }} in {{ $labels.namespace }}" description: "Current error rate: {{ $value | humanizePercentage }}" runbook_url: "https://runbooks.internal/high-error-rate" # ── Latency ─────────────────────────────────────── - alert: HighP99Latency expr: | histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (namespace, service, le) ) > 1.0 for: 5m labels: severity: warning annotations: summary: "p99 latency > 1s — {{ $labels.service }}" description: "p99: {{ $value | humanizeDuration }}" runbook_url: "https...

Details

Author
sawrus
Repository
sawrus/agent-guides
Created
3 months ago
Last Updated
3 days ago
Language
Shell
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

prometheus-grafana

Expert skill for Prometheus metrics and Grafana dashboards. Write and validate PromQL queries, generate Grafana dashboard JSON, create alerting and recording rules, analyze metric cardinality, and debug scrape configurations.

1,034 Updated today
a5c-ai
DevOps & Infrastructure Listed

monitoring

监控与告警

1 Updated today
ryukyagamilight
AI & Automation Solid

alertmanager-rules-config

Manage alertmanager rules config operations. Auto-activating skill for DevOps Advanced. Triggers on: alertmanager rules config, alertmanager rules config Part of the DevOps Advanced skill category. Use when configuring systems or services. Trigger with phrases like "alertmanager rules config", "alertmanager config", "alertmanager".

2,266 Updated today
jeremylongshore
AI & Automation Featured

prometheus-configuration

Complete guide to Prometheus setup, metric collection, scrape configuration, and recording rules.

39,227 Updated today
sickn33
AI & Automation Solid

creating-alerting-rules

This skill enables Claude to create intelligent alerting rules for proactive performance monitoring. It is triggered when the user requests to "create alerts", "define monitoring rules", or "set up alerting". The skill helps define thresholds, routing, and escalation policies, and offers options for multi-category alert creation, including latency, error rate, throughput, resource utilization, availability, and SLO violation alerts. It is useful for Site Reliability Engineers (SREs) and DevOps teams looking to improve system observability.

2,266 Updated today
jeremylongshore