groq-incident-runbook

Featured

Execute Groq incident response: triage, mitigation, fallback, and postmortem. Use when responding to Groq-related outages, investigating errors, or running post-incident reviews for Groq integration failures. Trigger with phrases like "groq incident", "groq outage", "groq down", "groq on-call", "groq emergency", "groq broken".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Groq Incident Runbook ## Overview Rapid incident response procedures for Groq API failures. Groq is a third-party inference provider -- when it goes down, your mitigation options are: wait, fall back to a different model, or fall back to a different provider. ## Severity Levels | Level | Definition | Response Time | Examples | |-------|------------|---------------|----------| | P1 | Complete API failure | < 15 min | Groq API returns 5xx on all models | | P2 | Degraded performance | < 1 hour | High latency, partial 429s, one model down | | P3 | Minor impact | < 4 hours | Intermittent errors, non-critical feature affected | | P4 | No user impact | Next business day | Monitoring gap, cost anomaly | ## Quick Triage (Run First) ```bash set -euo pipefail echo "=== 1. Groq API Status ===" curl -sf https://status.groq.com > /dev/null && echo "status.groq.com: REACHABLE" || echo "status.groq.com: UNREACHABLE" echo "" echo "=== 2. API Authentication ===" HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \ https://api.groq.com/openai/v1/models \ -H "Authorization: Bearer $GROQ_API_KEY") echo "GET /models: HTTP $HTTP_CODE" echo "" echo "=== 3. Model Availability ===" for model in "llama-3.1-8b-instant" "llama-3.3-70b-versatile"; do CODE=$(curl -s -o /dev/null -w "%{http_code}" \ https://api.groq.com/openai/v1/chat/completions \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: application/json" \ -d "{\"model\":\"$model\",\"messages\":[{\"role\"...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

cohere-incident-runbook

Execute Cohere incident response procedures with triage, mitigation, and postmortem. Use when responding to Cohere API outages, investigating errors, or running post-incident reviews for Cohere integration failures. Trigger with phrases like "cohere incident", "cohere outage", "cohere down", "cohere on-call", "cohere emergency", "cohere broken".

2,266 Updated today
jeremylongshore
AI & Automation Featured

apollo-incident-runbook

Apollo.io incident response procedures. Use when handling Apollo outages, debugging production issues, or responding to integration failures. Trigger with phrases like "apollo incident", "apollo outage", "apollo down", "apollo production issue", "apollo emergency".

2,266 Updated today
jeremylongshore
AI & Automation Featured

klaviyo-incident-runbook

Execute Klaviyo incident response procedures with triage, mitigation, and postmortem. Use when responding to Klaviyo-related outages, investigating API errors, or running post-incident reviews for Klaviyo integration failures. Trigger with phrases like "klaviyo incident", "klaviyo outage", "klaviyo down", "klaviyo on-call", "klaviyo emergency", "klaviyo broken".

2,266 Updated today
jeremylongshore
AI & Automation Featured

intercom-incident-runbook

Execute Intercom incident response procedures with triage, mitigation, and postmortem. Use when responding to Intercom API outages, investigating integration errors, or running post-incident reviews for Intercom failures. Trigger with phrases like "intercom incident", "intercom outage", "intercom down", "intercom on-call", "intercom emergency", "intercom broken".

2,266 Updated today
jeremylongshore
AI & Automation Featured

algolia-incident-runbook

Execute Algolia incident response: triage search failures, distinguish Algolia-side vs your-side issues, apply fallbacks, and run postmortems. Trigger: "algolia incident", "algolia outage", "algolia down", "algolia on-call", "algolia emergency", "algolia broken", "search is down".

2,266 Updated today
jeremylongshore