← ClaudeAtlas

resilience-analysislisted

Assess error handling, isolation boundaries, and recovery mechanisms in agent frameworks. Use when (1) tracing error propagation paths, (2) evaluating sandboxing for code execution, (3) understanding retry and fallback mechanisms, (4) assessing production readiness, or (5) identifying failure modes and recovery patterns.
aiskillstore/marketplace · ★ 329 · AI & Automation · score 79
Install: claude install-skill aiskillstore/marketplace
# Resilience Analysis Assesses error handling and isolation boundaries. ## Process 1. **Trace error propagation** — Map exception flow from tools to agent 2. **Identify isolation** — Sandbox mechanisms for dangerous operations 3. **Catalog recovery** — Retry logic, fallbacks, circuit breakers 4. **Assess boundaries** — What crashes propagate vs. are contained ## Error Propagation Analysis ### Questions to Answer 1. Does a tool exception terminate the agent? 2. Are LLM API errors retried automatically? 3. Is parsing failure (malformed output) recoverable? 4. What happens when state updates fail? ### Propagation Patterns **Crash Propagation (Dangerous)** ```python def run_tool(self, tool, args): return tool.execute(args) # Exception bubbles up ``` **Exception Wrapping** ```python def run_tool(self, tool, args): try: return tool.execute(args) except Exception as e: raise ToolExecutionError(tool.name, e) from e ``` **Error Containment** ```python def run_tool(self, tool, args): try: return ToolResult(success=True, output=tool.execute(args)) except Exception as e: return ToolResult(success=False, error=str(e)) ``` ### Propagation Map Template ``` User Input ↓ ┌─────────────────────────────────────────┐ │ Agent Loop │ │ ↓ │ │ ┌─────────────────────────────────────┐ │ │ │ LLM Call │ │ │ │ • APIError → [Retry 3x / Propagate