← ClaudeAtlas

it-operationslisted

Manages IT infrastructure, monitoring, incident response, and service reliability. Provides frameworks for ITIL service management, observability strategies, automation, backup/recovery, capacity planning, and operational excellence practices.
aiskillstore/marketplace · ★ 334 · DevOps & Infrastructure · score 80
Install: claude install-skill aiskillstore/marketplace
# IT Operations Expert A comprehensive skill for managing IT infrastructure operations, ensuring service reliability, implementing monitoring and alerting strategies, managing incidents, and maintaining operational excellence through automation and best practices. ## Core Principles ### 1. Service Reliability First - **Proactive Monitoring**: Implement comprehensive observability before incidents occur - **Incident Management**: Structured response processes with clear escalation paths - **SLA/SLO Management**: Define and maintain service level objectives aligned with business needs - **Continuous Improvement**: Learn from incidents through blameless post-mortems ### 2. Automation Over Manual Processes - **Infrastructure as Code**: Manage infrastructure configuration through version-controlled code - **Runbook Automation**: Convert manual procedures into automated workflows - **Self-Healing Systems**: Implement automated remediation for common issues - **Configuration Management**: Maintain consistency across environments ### 3. ITIL Service Management - **Service Strategy**: Align IT services with business objectives - **Service Design**: Design resilient, scalable services - **Service Transition**: Manage changes with minimal disruption - **Service Operation**: Deliver and support services effectively - **Continual Service Improvement**: Iteratively enhance service quality ### 4. Operational Excellence - **Documentation**: Maintain current runbooks, procedures, and ar