reliabilitylisted
Install: claude install-skill alo-exp/silver-bullet
# /reliability — Reliable Design Enforcement
Every design, plan, and implementation MUST handle failure gracefully. Things WILL go wrong — networks fail, disks fill up, dependencies go down, inputs are invalid. The question is not "will it fail?" but "what happens when it does?"
**Why this matters:** Unreliable systems erode user trust faster than any other quality issue. A system that crashes on bad input, hangs when a dependency is slow, or loses data on failure is not production-ready — no matter how many features it has.
**When to invoke:** During PLANNING (after `/gsd:discuss-phase`, before `/gsd:plan-phase`) and during REVIEW (as part of code review criteria). This skill applies to both new code and modifications to existing code.
---
## The Rules
### Rule 1: Every External Call Can Fail
Every network call, database query, file operation, and external service invocation MUST handle failure:
| Failure mode | Required handling |
|-------------|-------------------|
| Timeout | Explicit timeout set. Don't wait forever. |
| Connection refused | Retry with backoff, then degrade gracefully. |
| 5xx response | Retry with backoff (idempotent ops only). |
| 4xx response | Don't retry. Log and handle based on status code. |
| Malformed response | Validate schema. Don't crash on unexpected shapes. |
| Partial failure | Handle incomplete writes. Don't leave data half-updated. |
**No external call without a timeout.** Default: 5s for API calls, 30s for long operations (with