cohere-incident-runbook

Featured

Execute Cohere incident response procedures with triage, mitigation, and postmortem. Use when responding to Cohere API outages, investigating errors, or running post-incident reviews for Cohere integration failures. Trigger with phrases like "cohere incident", "cohere outage", "cohere down", "cohere on-call", "cohere emergency", "cohere broken".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Cohere Incident Runbook ## Overview Rapid incident response procedures for Cohere API v2 outages. Covers triage, mitigation, communication, and postmortem for Chat, Embed, Rerank, and Classify endpoints. ## Prerequisites - Access to [status.cohere.com](https://status.cohere.com) - kubectl access to production cluster - Prometheus/Grafana access - PagerDuty/Slack communication channels ## Severity Levels | Level | Definition | Response Time | Example | |-------|------------|---------------|---------| | P1 | All Cohere endpoints down | < 15 min | API returning 5xx globally | | P2 | Degraded (rate limits, high latency) | < 1 hour | 429 errors, P95 > 10s | | P3 | Single endpoint affected | < 4 hours | Embed works, Chat fails | | P4 | Non-blocking issue | Next business day | Slow response, minor errors | ## Quick Triage (Run These First) ```bash # 1. Check Cohere service status curl -s https://status.cohere.com/api/v2/status.json | jq '.status.description' # 2. Test each endpoint directly echo "--- Chat ---" curl -s -o /dev/null -w "%{http_code}" \ -X POST https://api.cohere.com/v2/chat \ -H "Authorization: Bearer $CO_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"command-r7b-12-2024","messages":[{"role":"user","content":"ping"}]}' echo -e "\n--- Embed ---" curl -s -o /dev/null -w "%{http_code}" \ -X POST https://api.cohere.com/v2/embed \ -H "Authorization: Bearer $CO_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"embed-v4....

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

intercom-incident-runbook

Execute Intercom incident response procedures with triage, mitigation, and postmortem. Use when responding to Intercom API outages, investigating integration errors, or running post-incident reviews for Intercom failures. Trigger with phrases like "intercom incident", "intercom outage", "intercom down", "intercom on-call", "intercom emergency", "intercom broken".

2,266 Updated today
jeremylongshore
AI & Automation Featured

replit-incident-runbook

Execute Replit incident response: triage deployment failures, database issues, and platform outages. Use when responding to Replit-related outages, investigating deployment crashes, or running post-incident reviews for Replit app failures. Trigger with phrases like "replit incident", "replit outage", "replit down", "replit emergency", "replit broken", "replit crash".

2,266 Updated today
jeremylongshore
AI & Automation Featured

fireflies-incident-runbook

Execute Fireflies.ai incident response with triage, remediation, and postmortem. Use when responding to Fireflies.ai API outages, auth failures, or webhook delivery problems. Trigger with phrases like "fireflies incident", "fireflies outage", "fireflies down", "fireflies on-call", "fireflies emergency", "fireflies broken".

2,266 Updated today
jeremylongshore
AI & Automation Featured

hubspot-incident-runbook

Execute HubSpot incident response with triage, mitigation, and postmortem. Use when responding to HubSpot API outages, investigating CRM errors, or running post-incident reviews for HubSpot integration failures. Trigger with phrases like "hubspot incident", "hubspot outage", "hubspot down", "hubspot on-call", "hubspot emergency", "hubspot broken".

2,266 Updated today
jeremylongshore
AI & Automation Featured

groq-incident-runbook

Execute Groq incident response: triage, mitigation, fallback, and postmortem. Use when responding to Groq-related outages, investigating errors, or running post-incident reviews for Groq integration failures. Trigger with phrases like "groq incident", "groq outage", "groq down", "groq on-call", "groq emergency", "groq broken".

2,266 Updated today
jeremylongshore