creating-alerting-rules

Solid

This skill enables Claude to create intelligent alerting rules for proactive performance monitoring. It is triggered when the user requests to "create alerts", "define monitoring rules", or "set up alerting". The skill helps define thresholds, routing, and escalation policies, and offers options for multi-category alert creation, including latency, error rate, throughput, resource utilization, availability, and SLO violation alerts. It is useful for Site Reliability Engineers (SREs) and DevOps teams looking to improve system observability.

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

## Overview This skill automates the creation of comprehensive alerting rules, reducing the manual effort required for performance monitoring. It guides you through defining alert categories, setting intelligent thresholds, and configuring routing and escalation policies. The skill also helps generate runbooks and establish alert testing procedures. ## How It Works 1. **Identify Alert Category**: Determines the type of alert to create (e.g., latency, error rate, resource utilization). 2. **Define Thresholds**: Sets appropriate thresholds to avoid alert fatigue and ensure timely notification of performance issues. 3. **Configure Routing and Escalation**: Establishes routing policies to direct alerts to the appropriate teams and escalation policies for timely response. 4. **Generate Runbook**: Creates a basic runbook with steps to diagnose and resolve the alerted issue. ## When to Use This Skill This skill activates when you need to: - Implement performance monitoring for a new service. - Refine existing alerting rules to reduce false positives. - Create alerts for specific performance metrics, such as latency or error rate. ## Examples ### Example 1: Setting up Latency Alerts User request: "create latency alerts for the payment service" The skill will: 1. Prompt for latency thresholds (e.g., warning and critical). 2. Configure alerts to trigger when latency exceeds defined thresholds. ### Example 2: Creating Error Rate Alerts User request: "set up alerting for error...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

monitoring-error-rates

This skill enables Claude to monitor and analyze application error rates to improve reliability. It is used when the user needs to track and understand errors occurring in their application, including HTTP errors, application exceptions, database errors, external API errors, background job errors, and client-side errors. Use this skill when the user asks to "monitor errors", "analyze error rates", "track application errors", or requests help with "error monitoring". It sets up comprehensive error tracking and alerting based on defined thresholds.

2,266 Updated today
jeremylongshore
AI & Automation Solid

setting-up-synthetic-monitoring

This skill automates the setup of synthetic monitoring for applications. It allows Claude to proactively track performance and availability by configuring uptime, transaction, and API monitoring. Use this skill when the user requests to "set up synthetic monitoring", "configure uptime monitoring", "track application performance", or needs help with "proactive performance tracking". The skill helps to identify critical endpoints and user journeys, design monitoring scenarios, and configure alerts and dashboards.

2,266 Updated today
jeremylongshore
AI & Automation Solid

alertmanager-rules-config

Manage alertmanager rules config operations. Auto-activating skill for DevOps Advanced. Triggers on: alertmanager rules config, alertmanager rules config Part of the DevOps Advanced skill category. Use when configuring systems or services. Trigger with phrases like "alertmanager rules config", "alertmanager config", "alertmanager".

2,266 Updated today
jeremylongshore
AI & Automation Solid

creating-apm-dashboards

This skill enables Claude to create Application Performance Monitoring (APM) dashboards. It is triggered when the user requests the creation of a new APM dashboard, monitoring dashboard, or a dashboard for application performance. The skill helps define key metrics and visualizations for monitoring application health, performance, and user experience across multiple platforms like Grafana and Datadog. Use this skill when the user needs assistance setting up a new monitoring solution or expanding an existing one. The plugin supports the creation of dashboards focusing on golden signals, request metrics, resource utilization, database metrics, cache metrics, business metrics, and error tracking.

2,266 Updated today
jeremylongshore
AI & Automation Solid

tracking-service-reliability

This skill enables Claude to define and track Service Level Agreements (SLAs), Service Level Indicators (SLIs), and Service Level Objectives (SLOs) for improved service reliability. It is triggered when the user needs to establish, monitor, or analyze service performance metrics. Use this skill when the user mentions "SLA", "SLI", "SLO", "error budget", "service reliability", or "track service performance". The skill helps to define key metrics, set targets, and monitor performance against those targets.

2,266 Updated today
jeremylongshore