oraclecloud-incident-runbook

Featured

Self-service incident runbook for OCI outages — health probes, instance recovery, cross-AD/region failover. Use when OCI instances go down, the status page is silent, or you need automated recovery without waiting for support. Trigger with "oraclecloud incident", "oci outage runbook", "oci failover", "oci instance recovery".

DevOps & Infrastructure 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Oracle Cloud Incident Runbook ## Overview Self-service runbook for when OCI instances go down and the status page stays green. OCI's status page has a history of not acknowledging outages in real time (London Jan 2026 — 502s and instances disappearing for 10 minutes with no status update). OCI Support response times average 4+ hours for Sev-1 tickets. This runbook gives you health probes, automated instance recovery, cross-AD failover, and cross-region failover — all executable without waiting on Oracle. **Purpose:** Detect OCI service degradation independently, recover instances automatically, and fail over to alternate availability domains or regions when the primary is impacted. ## Prerequisites - **OCI CLI installed and configured** — `~/.oci/config` validated (see `oraclecloud-install-auth`) - **Python 3.8+** with the OCI SDK — `pip install oci` - **Pre-configured resources**: at least one compute instance, a VCN with subnets in multiple ADs - **IAM policies**: `manage instances`, `manage volumes`, `inspect work-requests` in the target compartment - **Boot volume backups** enabled (recovery depends on having a recent backup) ## Instructions ### Step 1: Independent Health Probes Do not trust the OCI status page alone. Run your own health checks against the OCI API: ```python import oci import time config = oci.config.from_file("~/.oci/config") def probe_oci_health(config): """Probe OCI API endpoints independently of the status page.""" results = {} ...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

apollo-incident-runbook

Apollo.io incident response procedures. Use when handling Apollo outages, debugging production issues, or responding to integration failures. Trigger with phrases like "apollo incident", "apollo outage", "apollo down", "apollo production issue", "apollo emergency".

2,266 Updated today
jeremylongshore
AI & Automation Featured

oraclecloud-prod-checklist

Pre-production readiness checklist for OCI — backup policies, security audit, key rotation, encryption, and Cloud Guard. Use when preparing an OCI environment for production workloads or auditing an existing deployment. Trigger with "oraclecloud prod checklist", "oci production ready", "oci security audit", "oci well-architected".

2,266 Updated today
jeremylongshore
DevOps & Infrastructure Featured

oraclecloud-observability

Set up programmatic monitoring, logging, and alarms for OCI resources. Use when configuring OCI Monitoring metrics, creating alarm rules, publishing custom metrics, or searching logs via the Logging service. Trigger with "oraclecloud observability", "oci monitoring", "oci alarms", "oci logging", "oracle cloud observability".

2,266 Updated today
jeremylongshore
AI & Automation Solid

supabase-incident-runbook

Execute Supabase incident response: dashboard health checks, connection pool status, pg_stat_activity queries, RLS debugging, Edge Function logs, storage health, and escalation. Use when responding to Supabase outages, investigating production errors, debugging connection issues, or preparing evidence for Supabase support escalation. Trigger: "supabase incident", "supabase outage", "supabase down", "supabase on-call", "supabase emergency", "supabase broken", "supabase connection issues".

2,266 Updated today
jeremylongshore
AI & Automation Featured

replit-incident-runbook

Execute Replit incident response: triage deployment failures, database issues, and platform outages. Use when responding to Replit-related outages, investigating deployment crashes, or running post-incident reviews for Replit app failures. Trigger with phrases like "replit incident", "replit outage", "replit down", "replit emergency", "replit broken", "replit crash".

2,266 Updated today
jeremylongshore