nexus-observabilitylisted
Install: claude install-skill aayushostwal/nexus
# Nexus Observability — API Failure Correlation Engine
Systematically correlate failures across distributed services to identify causal chains, blast radius, and the origin service — before proposing any fix.
---
## Compatibility
- Supporting files: `checklists/investigation-checklist.md`, `anti-patterns/common-mistakes.md`, `validation/output-validation.md`
- Required tools: Read, Bash, Grep
- Optional tools: WebSearch (vendor-specific error lookups)
- Hands off to: `nexus:debugging` once origin service is identified; `nexus:planning` after fix is agreed upon
---
## Core Principle
**Never assign blame before building the full cross-service timeline.** A cascade always has one origin — fixing a victim while the cause is live means the failure recurs.
---
## Workflow
### Step 1 — Context Acquisition
Collect before reading any logs. Require items 1–5 minimum; ask for all in one message:
| # | Collect | Why |
|---|---------|-----|
| 1 | Verbatim error logs from **all** affected services | Paraphrased logs lose exact timestamps and error codes |
| 2 | Distributed traces (Jaeger/Zipkin/X-Ray/Tempo) for ≥3 failing requests | Shows exact call path and which hop introduced error/latency |
| 3 | Metrics: error rate, p99 latency, RPS, CPU/mem/conn-pool per service | Distinguishes saturation from errors from cascades |
| 4 | Exact first-elevated-error timestamp per service | Required for timeline in Step 2 |
| 5 | Dependency map (direct + indirect call relationships) | Requir