running-chaos-tests

Solid

Execute chaos engineering experiments to test system resilience. Use when performing specialized testing. Trigger with phrases like "run chaos tests", "test resilience", or "inject failures".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Chaos Engineering Toolkit ## Overview Execute controlled chaos engineering experiments to test system resilience, fault tolerance, and recovery capabilities. Injects failures including network latency, service crashes, resource exhaustion, and dependency outages to verify that systems degrade gracefully and recover automatically. ## Prerequisites - Distributed system or microservice architecture deployed in a staging/test environment - Monitoring and alerting configured (Grafana, Datadog, CloudWatch, or Prometheus) - Rollback capability for the target environment (manual or automated) - Chaos engineering tool installed (toxiproxy, Pumba, Litmus, or Chaos Mesh) - Explicit approval from the team to run chaos experiments - Steady-state hypothesis defined (what "healthy" looks like in metrics) ## Instructions 1. Define the steady-state hypothesis: - Identify measurable indicators of normal system behavior (e.g., p99 latency < 500ms, error rate < 0.1%, all health checks pass). - Record baseline metrics before injecting any failures. - Define the blast radius -- which services and users are affected by the experiment. 2. Design chaos experiments by category: - **Network**: Inject latency (200-2000ms), packet loss (5-50%), DNS failure, connection timeout. - **Process**: Kill a service instance, exhaust CPU or memory, fill disk. - **Dependency**: Block access to database, cache, or external API. - **State**: Corrupt data, introduce clock skew, simulate sp...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

chaos-engineering

Provides chaos engineering best practices for resilience testing, fault injection, and game day planning. Use when designing resilience experiments, configuring chaos tools, planning game days, or when user mentions 'chaos engineering', 'resilience', 'litmus', 'game day', 'fault injection', 'chaos monkey', 'blast radius', 'steady state', 'failure mode'.

62 Updated today
Tibsfox
AI & Automation Solid

chaos-engineering

Design and run chaos experiments in Kubernetes — pod failures, network partitions, resource pressure with LitmusChaos and manual chaos.

14 Updated 3 days ago
sawrus
AI & Automation Listed

chaos-experiment

Design and document chaos engineering experiments. Guide steady state baseline, hypothesis formation, failure injection plans, and results analysis. Use for resilience testing, game days, failure injection experiments, and building confidence in system stability.

33 Updated today
rjmurillo
AI & Automation Solid

chaos-runner

Run chaos engineering experiments using Chaos Monkey, Litmus, or Gremlin

1,034 Updated today
a5c-ai
AI & Automation Solid

chaos-engineer

Designs chaos experiments, creates failure injection frameworks, and facilitates game day exercises for distributed systems — producing runbooks, experiment manifests, rollback procedures, and post-mortem templates. Use when designing chaos experiments, implementing failure injection frameworks, or conducting game day exercises. Invoke for chaos experiments, resilience testing, blast radius control, game days, antifragile systems, fault injection, Chaos Monkey, Litmus Chaos.

9,509 Updated 1 weeks ago
Jeffallan