algolia-incident-runbook

Featured

Execute Algolia incident response: triage search failures, distinguish Algolia-side vs your-side issues, apply fallbacks, and run postmortems. Trigger: "algolia incident", "algolia outage", "algolia down", "algolia on-call", "algolia emergency", "algolia broken", "search is down".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Algolia Incident Runbook ## Overview Rapid incident response procedures for Algolia search failures. Algolia's infrastructure is distributed across multiple data centers with automatic failover, so true Algolia outages are rare. Most incidents are caused by API key issues, settings drift, or indexing pipeline failures on your side. ## Severity Classification | Level | Definition | Response | Examples | |-------|------------|----------|----------| | P1 | Search completely down | < 15 min | 403/500 on all queries, Algolia unreachable | | P2 | Degraded search | < 1 hour | High latency, partial results, stale data | | P3 | Minor issue | < 4 hours | Analytics not updating, synonyms wrong | | P4 | No user impact | Next day | Monitoring gap, test failures | ## Quick Triage (Run These First) ```bash #!/bin/bash echo "=== ALGOLIA TRIAGE ===" # 1. Is Algolia's infrastructure up? echo -n "Algolia Status: " curl -s https://status.algolia.com/api/v2/status.json | jq -r '.status.description' # 2. Can we reach our Algolia app? echo -n "API Connectivity: " curl -s -o /dev/null -w "HTTP %{http_code} (%{time_total}s)" \ "https://${ALGOLIA_APP_ID}-dsn.algolia.net/1/indexes" \ -H "X-Algolia-Application-Id: ${ALGOLIA_APP_ID}" \ -H "X-Algolia-API-Key: ${ALGOLIA_ADMIN_KEY}" echo "" # 3. Can we actually search? echo -n "Search test: " curl -s "https://${ALGOLIA_APP_ID}-dsn.algolia.net/1/indexes/products/query" \ -H "X-Algolia-Application-Id: ${ALGOLIA_APP_ID}" \ -H "X-Algolia-AP...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

mistral-incident-runbook

Execute Mistral AI incident response procedures with triage, mitigation, and postmortem. Use when responding to Mistral AI-related outages, investigating errors, or running post-incident reviews. Trigger with phrases like "mistral incident", "mistral outage", "mistral down", "mistral on-call", "mistral emergency".

2,266 Updated today
jeremylongshore
AI & Automation Featured

klaviyo-incident-runbook

Execute Klaviyo incident response procedures with triage, mitigation, and postmortem. Use when responding to Klaviyo-related outages, investigating API errors, or running post-incident reviews for Klaviyo integration failures. Trigger with phrases like "klaviyo incident", "klaviyo outage", "klaviyo down", "klaviyo on-call", "klaviyo emergency", "klaviyo broken".

2,266 Updated today
jeremylongshore
AI & Automation Featured

databricks-incident-runbook

Execute Databricks incident response procedures with triage, mitigation, and postmortem. Use when responding to Databricks-related outages, investigating job failures, or running post-incident reviews for pipeline failures. Trigger with phrases like "databricks incident", "databricks outage", "databricks down", "databricks on-call", "databricks emergency", "job failed".

2,266 Updated today
jeremylongshore
AI & Automation Featured

intercom-incident-runbook

Execute Intercom incident response procedures with triage, mitigation, and postmortem. Use when responding to Intercom API outages, investigating integration errors, or running post-incident reviews for Intercom failures. Trigger with phrases like "intercom incident", "intercom outage", "intercom down", "intercom on-call", "intercom emergency", "intercom broken".

2,266 Updated today
jeremylongshore
AI & Automation Featured

groq-incident-runbook

Execute Groq incident response: triage, mitigation, fallback, and postmortem. Use when responding to Groq-related outages, investigating errors, or running post-incident reviews for Groq integration failures. Trigger with phrases like "groq incident", "groq outage", "groq down", "groq on-call", "groq emergency", "groq broken".

2,266 Updated today
jeremylongshore