← ClaudeAtlas

planning-disaster-recoverylisted

Design and implement disaster recovery strategies with RTO/RPO planning, database backups, Kubernetes DR, cross-region replication, and chaos engineering testing. Use when implementing backup systems, configuring point-in-time recovery, setting up multi-region failover, or validating DR procedures.
ancoleman/ai-design-components · ★ 368 · Web & Frontend · score 80
Install: claude install-skill ancoleman/ai-design-components
# Disaster Recovery ## Purpose Provide comprehensive guidance for designing disaster recovery (DR) strategies, implementing backup systems, and validating recovery procedures across databases, Kubernetes clusters, and cloud infrastructure. Enable teams to define RTO/RPO objectives, select appropriate backup tools, configure automated failover, and test DR capabilities through chaos engineering. ## When to Use This Skill Invoke this skill when: - Defining recovery time objectives (RTO) and recovery point objectives (RPO) - Implementing database backups with point-in-time recovery (PITR) - Setting up Kubernetes cluster backup and restore workflows - Configuring cross-region replication for high availability - Testing disaster recovery procedures through chaos experiments - Meeting compliance requirements (GDPR, SOC 2, HIPAA) - Automating backup monitoring and alerting - Designing multi-cloud disaster recovery architectures ## Core Concepts ### RTO and RPO Fundamentals **Recovery Time Objective (RTO):** Maximum acceptable downtime after a disaster before business impact becomes unacceptable. **Recovery Point Objective (RPO):** Maximum acceptable data loss measured in time. Defines how far back in time recovery must reach. **Criticality Tiers:** - **Tier 0 (Mission-Critical):** RTO < 1 hour, RPO < 5 minutes - **Tier 1 (Production):** RTO 1-4 hours, RPO 15-60 minutes - **Tier 2 (Important):** RTO 4-24 hours, RPO 1-6 hours - **Tier 3 (Standard):** RTO > 24 hours, RPO > 6 h