← ClaudeAtlas

watch-runlisted

Emit-on-event health watcher for a long paired/ab benchmark run, meant to be fed to the Monitor tool (each stdout line becomes one notification) so an unattended 75-min run cannot fail silently. Coverage-first by design — it speaks up on liveness (ALIVE), on new error signatures in the proxy/tee logs (ERROR), on a stall or crash (STALL/DEAD — silence is not success), and on completion (COMPLETE) — not just the happy path. Stall is tracked on the SHADOW (A0) turn-log, which is passthrough with keep-alive OFF so every line is a real turn; the PRIMARY (A5) log also grows from keep-alive pings during idle gaps and would mask a stalled conversation, so it must not drive stall detection. Use it to babysit `.skills/paired` or `.skills/ab` runs whose wall clock exceeds the Monitor's 60-min non-persistent cap (run it persistent).
zapgun-ai/clawback · ★ 2 · AI & Automation · score 63
Install: claude install-skill zapgun-ai/clawback
# clawback run health watcher `.skills/watch-run/scripts/watch_run.sh <out-dir> [listen-port primary-port shadow-port]` Defaults: ports `8788` (tee/listen) `8790` (primary) `8791` (shadow) — the `.skills/paired` defaults. Pass the run's `--out` dir as the first arg. ## How to use (with Monitor) Launch the run in the background first (e.g. `.skills/scripts/run_paired.sh … --out runs/paired-haiku-L1`), then attach this under the **Monitor** tool with `persistent: true` (a 75-min run exceeds Monitor's 60-min non-persistent timeout). The watcher exits on its own at COMPLETE/DEAD/WATCH-END, which ends the Monitor cleanly; TaskStop it early if you abort the run. ``` Monitor(persistent=true, command=".skills/watch-run/scripts/watch_run.sh runs/paired-haiku-L1 8788 8790 8791") ``` ## Events it emits (one line = one notification) - `ALIVE a0=.. a5=..` — first turns observed; the run came up. - `ERROR(+n) … :: <last matching line>` — n new error-signature lines in `proxy.primary.log` / `proxy.shadow.log` / `tee.log` (overload, 429, EADDRINUSE/ECONN*, `→ 4xx/5xx`, Traceback, force-exit, `[error]`, …). - `STALL …` — no new SHADOW turn for 12 min but a proxy is still listening (warned once; the conversation is wedged, processes alive). - `DEAD …` — no new SHADOW turn for 12 min AND no proxy listening → the run crashed; exits 1. - `COMPLETE a0=.. a5=.. (+Nm)` — `report.md` present (analyzer finished); exits 0. - `WATCH-END` — ~95-min wall-clock backstop tripped; exits 2.